Search Results: "rene"

25 September 2023

Michael Prokop: Postfix failing with no shared cipher

I m one of the few folks left who run and maintain mail servers. Recently I had major troubles receiving mails from the mail servers used by a bank, and when asking my favourite search engine, I m clearly not the only one who ran into such an issue. Actually, I should have checked off the issue and not become a customer at that bank, but the tech nerd in me couldn t resist getting to the bottom of the problem. Since I got it working and this might be useful for others, here we are. :) I was trying to get an online banking account set up, but the corresponding account creation mail didn t arrive me, at all. Looking at my mail server logs, my postfix mail server didn t accept the mail due to:
postfix/smtpd[3319640]: warning: TLS library problem: error:1417A0C1:SSL routines:tls_post_process_client_hello:no shared cipher:../ssl/statem/statem_srvr.c:2283:
postfix/smtpd[3319640]: lost connection after STARTTLS from mx01.arz.at[193.110.182.61]
Huh, what s going on here?! Let s increase the TLS loglevel (setting smtpd_tls_loglevel = 2) and retry. But how can I retry receiving yet another mail? Luckily, on the registration website of the bank there was a URL available, that let me request a one-time password. This triggered another mail, so I did that and managed to grab this in the logs:
postfix/smtpd[3320018]: initializing the server-side TLS engine
postfix/tlsmgr[3320020]: open smtpd TLS cache btree:/var/lib/postfix/smtpd_scache
postfix/tlsmgr[3320020]: tlsmgr_cache_run_event: start TLS smtpd session cache cleanup
postfix/smtpd[3320018]: connect from mx01.arz.at[193.110.182.61]
postfix/smtpd[3320018]: setting up TLS connection from mx01.arz.at[193.110.182.61]
postfix/smtpd[3320018]: mx01.arz.at[193.110.182.61]: TLS cipher list "aNULL:-aNULL:HIGH:MEDIUM:+RC4:@STRENGTH"
postfix/smtpd[3320018]: SSL_accept:before SSL initialization
postfix/smtpd[3320018]: SSL_accept:before SSL initialization
postfix/smtpd[3320018]: SSL3 alert write:fatal:handshake failure
postfix/smtpd[3320018]: SSL_accept:error in error
postfix/smtpd[3320018]: SSL_accept error from mx01.arz.at[193.110.182.61]: -1
postfix/smtpd[3320018]: warning: TLS library problem: error:1417A0C1:SSL routines:tls_post_process_client_hello:no shared cipher:../ssl/statem/statem_srvr.c:2283:
postfix/smtpd[3320018]: lost connection after STARTTLS from mx01.arz.at[193.110.182.61]
postfix/smtpd[3320018]: disconnect from mx01.arz.at[193.110.182.61] ehlo=1 starttls=0/1 commands=1/2
postfix/smtpd[3320018]: connect from mx01.arz.at[193.110.182.61]
postfix/smtpd[3320018]: disconnect from mx01.arz.at[193.110.182.61] ehlo=1 quit=1 commands=2
Ok, so this TLS cipher list aNULL:-aNULL:HIGH:MEDIUM:+RC4:@STRENGTH looked like the tls_medium_cipherlist setting in postfix, but which ciphers might we expect? Let s see what their SMTP server would speak to us:
% testssl --cipher-per-proto -t=smtp mx01.arz.at:25
[...]
Hexcode  Cipher Suite Name (OpenSSL)       KeyExch.   Encryption  Bits     Cipher Suite Name (IANA/RFC)
-----------------------------------------------------------------------------------------------------------------------------
SSLv2
SSLv3
TLS 1
TLS 1.1
TLS 1.2
 xc030   ECDHE-RSA-AES256-GCM-SHA384       ECDH 256   AESGCM      256      TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
 xc028   ECDHE-RSA-AES256-SHA384           ECDH 256   AES         256      TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
 xc014   ECDHE-RSA-AES256-SHA              ECDH 256   AES         256      TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
 x9d     AES256-GCM-SHA384                 RSA        AESGCM      256      TLS_RSA_WITH_AES_256_GCM_SHA384
 x3d     AES256-SHA256                     RSA        AES         256      TLS_RSA_WITH_AES_256_CBC_SHA256
 x35     AES256-SHA                        RSA        AES         256      TLS_RSA_WITH_AES_256_CBC_SHA
 xc02f   ECDHE-RSA-AES128-GCM-SHA256       ECDH 256   AESGCM      128      TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
 xc027   ECDHE-RSA-AES128-SHA256           ECDH 256   AES         128      TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
 xc013   ECDHE-RSA-AES128-SHA              ECDH 256   AES         128      TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
 x9c     AES128-GCM-SHA256                 RSA        AESGCM      128      TLS_RSA_WITH_AES_128_GCM_SHA256
 x3c     AES128-SHA256                     RSA        AES         128      TLS_RSA_WITH_AES_128_CBC_SHA256
 x2f     AES128-SHA                        RSA        AES         128      TLS_RSA_WITH_AES_128_CBC_SHA
TLS 1.3
Looks like a very small subset of ciphers, and they don t seem to be talking TLS v1.3 at all? Not great. :( A nice web service to verify the situation from another point of view is checktls, which also confirmed this:
[000.705] 	<-- 	220 2.0.0 Ready to start TLS
[000.705] 		STARTTLS command works on this server
[001.260] 		Connection converted to SSL
		SSLVersion in use: TLSv1_2
		Cipher in use: ECDHE-RSA-AES256-GCM-SHA384
		Perfect Forward Secrecy: yes
		Session Algorithm in use: Curve P-256 DHE(256 bits)
		Certificate #1 of 3 (sent by MX):
		Cert VALIDATED: ok
		Cert Hostname VERIFIED (mx01.arz.at = *.arz.at   DNS:*.arz.at   DNS:arz.at)
[...]
[001.517] 		TLS successfully started on this server
I got distracted by some other work, and when coming back to this problem, the one-time password procedure no longer worked, as the password reset URL was no longer valid. :( I managed to find the underlying URL, and with some web developer tools tinkering I could still use the website to let me trigger sending further one-time password mails, phew. Let s continue, so my mail server was running Debian/bullseye with postfix v3.5.18-0+deb11u1 and openssl v1.1.1n-0+deb11u5, let s see what it offers:
% testssl --cipher-per-proto -t=smtp mail.example.com:25
[...]
Hexcode  Cipher Suite Name (OpenSSL)       KeyExch.   Encryption  Bits     Cipher Suite Name (IANA/RFC)
-----------------------------------------------------------------------------------------------------------------------------
SSLv2
SSLv3
TLS 1
 xc00a   ECDHE-ECDSA-AES256-SHA            ECDH 253   AES         256      TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
 xc019   AECDH-AES256-SHA                  ECDH 253   AES         256      TLS_ECDH_anon_WITH_AES_256_CBC_SHA
 x3a     ADH-AES256-SHA                    DH 2048    AES         256      TLS_DH_anon_WITH_AES_256_CBC_SHA
 x89     ADH-CAMELLIA256-SHA               DH 2048    Camellia    256      TLS_DH_anon_WITH_CAMELLIA_256_CBC_SHA
 xc009   ECDHE-ECDSA-AES128-SHA            ECDH 253   AES         128      TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
 xc018   AECDH-AES128-SHA                  ECDH 253   AES         128      TLS_ECDH_anon_WITH_AES_128_CBC_SHA
 x34     ADH-AES128-SHA                    DH 2048    AES         128      TLS_DH_anon_WITH_AES_128_CBC_SHA
 x9b     ADH-SEED-SHA                      DH 2048    SEED        128      TLS_DH_anon_WITH_SEED_CBC_SHA
 x46     ADH-CAMELLIA128-SHA               DH 2048    Camellia    128      TLS_DH_anon_WITH_CAMELLIA_128_CBC_SHA
TLS 1.1
 xc00a   ECDHE-ECDSA-AES256-SHA            ECDH 253   AES         256      TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
 xc019   AECDH-AES256-SHA                  ECDH 253   AES         256      TLS_ECDH_anon_WITH_AES_256_CBC_SHA
 x3a     ADH-AES256-SHA                    DH 2048    AES         256      TLS_DH_anon_WITH_AES_256_CBC_SHA
 x89     ADH-CAMELLIA256-SHA               DH 2048    Camellia    256      TLS_DH_anon_WITH_CAMELLIA_256_CBC_SHA
 xc009   ECDHE-ECDSA-AES128-SHA            ECDH 253   AES         128      TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
 xc018   AECDH-AES128-SHA                  ECDH 253   AES         128      TLS_ECDH_anon_WITH_AES_128_CBC_SHA
 x34     ADH-AES128-SHA                    DH 2048    AES         128      TLS_DH_anon_WITH_AES_128_CBC_SHA
 x9b     ADH-SEED-SHA                      DH 2048    SEED        128      TLS_DH_anon_WITH_SEED_CBC_SHA
 x46     ADH-CAMELLIA128-SHA               DH 2048    Camellia    128      TLS_DH_anon_WITH_CAMELLIA_128_CBC_SHA
TLS 1.2
 xc02c   ECDHE-ECDSA-AES256-GCM-SHA384     ECDH 253   AESGCM      256      TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
 xc024   ECDHE-ECDSA-AES256-SHA384         ECDH 253   AES         256      TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
 xc00a   ECDHE-ECDSA-AES256-SHA            ECDH 253   AES         256      TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
 xcca9   ECDHE-ECDSA-CHACHA20-POLY1305     ECDH 253   ChaCha20    256      TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
 xc0af   ECDHE-ECDSA-AES256-CCM8           ECDH 253   AESCCM8     256      TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8
 xc0ad   ECDHE-ECDSA-AES256-CCM            ECDH 253   AESCCM      256      TLS_ECDHE_ECDSA_WITH_AES_256_CCM
 xc073   ECDHE-ECDSA-CAMELLIA256-SHA384    ECDH 253   Camellia    256      TLS_ECDHE_ECDSA_WITH_CAMELLIA_256_CBC_SHA384
 xc019   AECDH-AES256-SHA                  ECDH 253   AES         256      TLS_ECDH_anon_WITH_AES_256_CBC_SHA
 xa7     ADH-AES256-GCM-SHA384             DH 2048    AESGCM      256      TLS_DH_anon_WITH_AES_256_GCM_SHA384
 x6d     ADH-AES256-SHA256                 DH 2048    AES         256      TLS_DH_anon_WITH_AES_256_CBC_SHA256
 x3a     ADH-AES256-SHA                    DH 2048    AES         256      TLS_DH_anon_WITH_AES_256_CBC_SHA
 xc5     ADH-CAMELLIA256-SHA256            DH 2048    Camellia    256      TLS_DH_anon_WITH_CAMELLIA_256_CBC_SHA256
 x89     ADH-CAMELLIA256-SHA               DH 2048    Camellia    256      TLS_DH_anon_WITH_CAMELLIA_256_CBC_SHA
 xc05d   ECDHE-ECDSA-ARIA256-GCM-SHA384    ECDH 253   ARIAGCM     256      TLS_ECDHE_ECDSA_WITH_ARIA_256_GCM_SHA384
 xc02b   ECDHE-ECDSA-AES128-GCM-SHA256     ECDH 253   AESGCM      128      TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
 xc023   ECDHE-ECDSA-AES128-SHA256         ECDH 253   AES         128      TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
 xc009   ECDHE-ECDSA-AES128-SHA            ECDH 253   AES         128      TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
 xc0ae   ECDHE-ECDSA-AES128-CCM8           ECDH 253   AESCCM8     128      TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8
 xc0ac   ECDHE-ECDSA-AES128-CCM            ECDH 253   AESCCM      128      TLS_ECDHE_ECDSA_WITH_AES_128_CCM
 xc072   ECDHE-ECDSA-CAMELLIA128-SHA256    ECDH 253   Camellia    128      TLS_ECDHE_ECDSA_WITH_CAMELLIA_128_CBC_SHA256
 xc018   AECDH-AES128-SHA                  ECDH 253   AES         128      TLS_ECDH_anon_WITH_AES_128_CBC_SHA
 xa6     ADH-AES128-GCM-SHA256             DH 2048    AESGCM      128      TLS_DH_anon_WITH_AES_128_GCM_SHA256
 x6c     ADH-AES128-SHA256                 DH 2048    AES         128      TLS_DH_anon_WITH_AES_128_CBC_SHA256
 x34     ADH-AES128-SHA                    DH 2048    AES         128      TLS_DH_anon_WITH_AES_128_CBC_SHA
 xbf     ADH-CAMELLIA128-SHA256            DH 2048    Camellia    128      TLS_DH_anon_WITH_CAMELLIA_128_CBC_SHA256
 x9b     ADH-SEED-SHA                      DH 2048    SEED        128      TLS_DH_anon_WITH_SEED_CBC_SHA
 x46     ADH-CAMELLIA128-SHA               DH 2048    Camellia    128      TLS_DH_anon_WITH_CAMELLIA_128_CBC_SHA
 xc05c   ECDHE-ECDSA-ARIA128-GCM-SHA256    ECDH 253   ARIAGCM     128      TLS_ECDHE_ECDSA_WITH_ARIA_128_GCM_SHA256
TLS 1.3
 x1302   TLS_AES_256_GCM_SHA384            ECDH 253   AESGCM      256      TLS_AES_256_GCM_SHA384
 x1303   TLS_CHACHA20_POLY1305_SHA256      ECDH 253   ChaCha20    256      TLS_CHACHA20_POLY1305_SHA256
 x1301   TLS_AES_128_GCM_SHA256            ECDH 253   AESGCM      128      TLS_AES_128_GCM_SHA256
Not so bad, but sadly no overlap with any of the ciphers that mx01.arz.at offers. What about disabling STARTTLS for the mx01.arz.at (+ mx02.arz.at being another one used by the relevant domain) mail servers when talking to mine? Let s try that:
% sudo postconf -nf smtpd_discard_ehlo_keyword_address_maps
smtpd_discard_ehlo_keyword_address_maps =
    hash:/etc/postfix/smtpd_discard_ehlo_keywords
% cat /etc/postfix/smtpd_discard_ehlo_keywords
# *disable* starttls for mx01.arz.at / mx02.arz.at:
193.110.182.61 starttls
193.110.182.62 starttls
But the remote mail server doesn t seem to send mails without TLS:
postfix/smtpd[4151799]: connect from mx01.arz.at[193.110.182.61]
postfix/smtpd[4151799]: discarding EHLO keywords: STARTTLS
postfix/smtpd[4151799]: disconnect from mx01.arz.at[193.110.182.61] ehlo=1 quit=1 commands=2
Let s verify this further, but without fiddling with the main mail server too much. We can add a dedicated service to postfix (see serverfault), and run it in verbose mode, to get more detailled logging:
% sudo postconf -Mf
[...]
10025      inet  n       -       -       -       -       smtpd
    -o syslog_name=postfix/smtpd/badstarttls
    -o smtpd_tls_security_level=none
    -o smtpd_helo_required=yes
    -o smtpd_helo_restrictions=pcre:/etc/postfix/helo_badstarttls_allow,reject
    -v
[...]
% cat /etc/postfix/helo_badstarttls_allow
/mx01.arz.at/ OK
/mx02.arz.at/ OK
/193.110.182.61/ OK
/193.110.182.62/ OK
We redirect the traffic from mx01.arz.at + mx02.arz.at towards our new postfix service, listening on port 10025:
% sudo iptables -t nat -A PREROUTING -p tcp -s 193.110.182.61 --dport 25 -j REDIRECT --to-port 10025
% sudo iptables -t nat -A PREROUTING -p tcp -s 193.110.182.62 --dport 25 -j REDIRECT --to-port 10025
With this setup we get very detailed logging, and it seems to confirm our suspicion that the mail server doesn t want to talk unencrypted with us:
[...]
postfix/smtpd/badstarttls/smtpd[3491900]: connect from mx01.arz.at[193.110.182.61]
[...]
postfix/smtpd/badstarttls/smtpd[3491901]: disconnect from mx01.arz.at[193.110.182.61] ehlo=1 quit=1 commands=2
postfix/smtpd/badstarttls/smtpd[3491901]: master_notify: status 1
postfix/smtpd/badstarttls/smtpd[3491901]: connection closed
[...]
Let s step back and revert those changes, back to our original postfix setup. Might the problem be related to our Let s Encrypt certificate? Let s see what we have:
% echo QUIT   openssl s_client -connect mail.example.com:25 -starttls
[...]
issuer=C = US, O = Let's Encrypt, CN = R3
---
No client certificate CA names sent
Peer signing digest: SHA384
Peer signature type: ECDSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 4455 bytes and written 427 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 384 bit
[...]
We have an ECDSA based certificate, what about switching to RSA instead? Thanks to the wonderful dehydrated, this is as easy as:
% echo KEY_ALGO=rsa > certs/mail.example.com/config
% ./dehydrated -c --domain mail.example.com --force
% sudo systemctl reload postfix
With switching to RSA type key we get:
% echo QUIT   openssl s_client -connect mail.example.com:25 -starttls smtp
CONNECTED(00000003)
[...]
issuer=C = US, O = Let's Encrypt, CN = R3
---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 5295 bytes and written 427 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 4096 bit
Which ciphers do we offer now? Let s check:
% testssl --cipher-per-proto -t=smtp mail.example.com:25
[...]
Hexcode  Cipher Suite Name (OpenSSL)       KeyExch.   Encryption  Bits     Cipher Suite Name (IANA/RFC)
-----------------------------------------------------------------------------------------------------------------------------
SSLv2
SSLv3
TLS 1
 xc014   ECDHE-RSA-AES256-SHA              ECDH 253   AES         256      TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
 x39     DHE-RSA-AES256-SHA                DH 2048    AES         256      TLS_DHE_RSA_WITH_AES_256_CBC_SHA
 x88     DHE-RSA-CAMELLIA256-SHA           DH 2048    Camellia    256      TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA
 xc019   AECDH-AES256-SHA                  ECDH 253   AES         256      TLS_ECDH_anon_WITH_AES_256_CBC_SHA
 x3a     ADH-AES256-SHA                    DH 2048    AES         256      TLS_DH_anon_WITH_AES_256_CBC_SHA
 x89     ADH-CAMELLIA256-SHA               DH 2048    Camellia    256      TLS_DH_anon_WITH_CAMELLIA_256_CBC_SHA
 x35     AES256-SHA                        RSA        AES         256      TLS_RSA_WITH_AES_256_CBC_SHA
 x84     CAMELLIA256-SHA                   RSA        Camellia    256      TLS_RSA_WITH_CAMELLIA_256_CBC_SHA
 xc013   ECDHE-RSA-AES128-SHA              ECDH 253   AES         128      TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
 x33     DHE-RSA-AES128-SHA                DH 2048    AES         128      TLS_DHE_RSA_WITH_AES_128_CBC_SHA
 x9a     DHE-RSA-SEED-SHA                  DH 2048    SEED        128      TLS_DHE_RSA_WITH_SEED_CBC_SHA
 x45     DHE-RSA-CAMELLIA128-SHA           DH 2048    Camellia    128      TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA
 xc018   AECDH-AES128-SHA                  ECDH 253   AES         128      TLS_ECDH_anon_WITH_AES_128_CBC_SHA
 x34     ADH-AES128-SHA                    DH 2048    AES         128      TLS_DH_anon_WITH_AES_128_CBC_SHA
 x9b     ADH-SEED-SHA                      DH 2048    SEED        128      TLS_DH_anon_WITH_SEED_CBC_SHA
 x46     ADH-CAMELLIA128-SHA               DH 2048    Camellia    128      TLS_DH_anon_WITH_CAMELLIA_128_CBC_SHA
 x2f     AES128-SHA                        RSA        AES         128      TLS_RSA_WITH_AES_128_CBC_SHA
 x96     SEED-SHA                          RSA        SEED        128      TLS_RSA_WITH_SEED_CBC_SHA
 x41     CAMELLIA128-SHA                   RSA        Camellia    128      TLS_RSA_WITH_CAMELLIA_128_CBC_SHA
TLS 1.1
 xc014   ECDHE-RSA-AES256-SHA              ECDH 253   AES         256      TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
 x39     DHE-RSA-AES256-SHA                DH 2048    AES         256      TLS_DHE_RSA_WITH_AES_256_CBC_SHA
 x88     DHE-RSA-CAMELLIA256-SHA           DH 2048    Camellia    256      TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA
 xc019   AECDH-AES256-SHA                  ECDH 253   AES         256      TLS_ECDH_anon_WITH_AES_256_CBC_SHA
 x3a     ADH-AES256-SHA                    DH 2048    AES         256      TLS_DH_anon_WITH_AES_256_CBC_SHA
 x89     ADH-CAMELLIA256-SHA               DH 2048    Camellia    256      TLS_DH_anon_WITH_CAMELLIA_256_CBC_SHA
 x35     AES256-SHA                        RSA        AES         256      TLS_RSA_WITH_AES_256_CBC_SHA
 x84     CAMELLIA256-SHA                   RSA        Camellia    256      TLS_RSA_WITH_CAMELLIA_256_CBC_SHA
 xc013   ECDHE-RSA-AES128-SHA              ECDH 253   AES         128      TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
 x33     DHE-RSA-AES128-SHA                DH 2048    AES         128      TLS_DHE_RSA_WITH_AES_128_CBC_SHA
 x9a     DHE-RSA-SEED-SHA                  DH 2048    SEED        128      TLS_DHE_RSA_WITH_SEED_CBC_SHA
 x45     DHE-RSA-CAMELLIA128-SHA           DH 2048    Camellia    128      TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA
 xc018   AECDH-AES128-SHA                  ECDH 253   AES         128      TLS_ECDH_anon_WITH_AES_128_CBC_SHA
 x34     ADH-AES128-SHA                    DH 2048    AES         128      TLS_DH_anon_WITH_AES_128_CBC_SHA
 x9b     ADH-SEED-SHA                      DH 2048    SEED        128      TLS_DH_anon_WITH_SEED_CBC_SHA
 x46     ADH-CAMELLIA128-SHA               DH 2048    Camellia    128      TLS_DH_anon_WITH_CAMELLIA_128_CBC_SHA
 x2f     AES128-SHA                        RSA        AES         128      TLS_RSA_WITH_AES_128_CBC_SHA
 x96     SEED-SHA                          RSA        SEED        128      TLS_RSA_WITH_SEED_CBC_SHA
 x41     CAMELLIA128-SHA                   RSA        Camellia    128      TLS_RSA_WITH_CAMELLIA_128_CBC_SHA
TLS 1.2
 xc030   ECDHE-RSA-AES256-GCM-SHA384       ECDH 253   AESGCM      256      TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
 xc028   ECDHE-RSA-AES256-SHA384           ECDH 253   AES         256      TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
 xc014   ECDHE-RSA-AES256-SHA              ECDH 253   AES         256      TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
 x9f     DHE-RSA-AES256-GCM-SHA384         DH 2048    AESGCM      256      TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
 xcca8   ECDHE-RSA-CHACHA20-POLY1305       ECDH 253   ChaCha20    256      TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
 xccaa   DHE-RSA-CHACHA20-POLY1305         DH 2048    ChaCha20    256      TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256
 xc0a3   DHE-RSA-AES256-CCM8               DH 2048    AESCCM8     256      TLS_DHE_RSA_WITH_AES_256_CCM_8
 xc09f   DHE-RSA-AES256-CCM                DH 2048    AESCCM      256      TLS_DHE_RSA_WITH_AES_256_CCM
 x6b     DHE-RSA-AES256-SHA256             DH 2048    AES         256      TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
 x39     DHE-RSA-AES256-SHA                DH 2048    AES         256      TLS_DHE_RSA_WITH_AES_256_CBC_SHA
 xc077   ECDHE-RSA-CAMELLIA256-SHA384      ECDH 253   Camellia    256      TLS_ECDHE_RSA_WITH_CAMELLIA_256_CBC_SHA384
 xc4     DHE-RSA-CAMELLIA256-SHA256        DH 2048    Camellia    256      TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA256
 x88     DHE-RSA-CAMELLIA256-SHA           DH 2048    Camellia    256      TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA
 xc019   AECDH-AES256-SHA                  ECDH 253   AES         256      TLS_ECDH_anon_WITH_AES_256_CBC_SHA
 xa7     ADH-AES256-GCM-SHA384             DH 2048    AESGCM      256      TLS_DH_anon_WITH_AES_256_GCM_SHA384
 x6d     ADH-AES256-SHA256                 DH 2048    AES         256      TLS_DH_anon_WITH_AES_256_CBC_SHA256
 x3a     ADH-AES256-SHA                    DH 2048    AES         256      TLS_DH_anon_WITH_AES_256_CBC_SHA
 xc5     ADH-CAMELLIA256-SHA256            DH 2048    Camellia    256      TLS_DH_anon_WITH_CAMELLIA_256_CBC_SHA256
 x89     ADH-CAMELLIA256-SHA               DH 2048    Camellia    256      TLS_DH_anon_WITH_CAMELLIA_256_CBC_SHA
 x9d     AES256-GCM-SHA384                 RSA        AESGCM      256      TLS_RSA_WITH_AES_256_GCM_SHA384
 xc0a1   AES256-CCM8                       RSA        AESCCM8     256      TLS_RSA_WITH_AES_256_CCM_8
 xc09d   AES256-CCM                        RSA        AESCCM      256      TLS_RSA_WITH_AES_256_CCM
 x3d     AES256-SHA256                     RSA        AES         256      TLS_RSA_WITH_AES_256_CBC_SHA256
 x35     AES256-SHA                        RSA        AES         256      TLS_RSA_WITH_AES_256_CBC_SHA
 xc0     CAMELLIA256-SHA256                RSA        Camellia    256      TLS_RSA_WITH_CAMELLIA_256_CBC_SHA256
 x84     CAMELLIA256-SHA                   RSA        Camellia    256      TLS_RSA_WITH_CAMELLIA_256_CBC_SHA
 xc051   ARIA256-GCM-SHA384                RSA        ARIAGCM     256      TLS_RSA_WITH_ARIA_256_GCM_SHA384
 xc053   DHE-RSA-ARIA256-GCM-SHA384        DH 2048    ARIAGCM     256      TLS_DHE_RSA_WITH_ARIA_256_GCM_SHA384
 xc061   ECDHE-ARIA256-GCM-SHA384          ECDH 253   ARIAGCM     256      TLS_ECDHE_RSA_WITH_ARIA_256_GCM_SHA384
 xc02f   ECDHE-RSA-AES128-GCM-SHA256       ECDH 253   AESGCM      128      TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
 xc027   ECDHE-RSA-AES128-SHA256           ECDH 253   AES         128      TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
 xc013   ECDHE-RSA-AES128-SHA              ECDH 253   AES         128      TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
 x9e     DHE-RSA-AES128-GCM-SHA256         DH 2048    AESGCM      128      TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
 xc0a2   DHE-RSA-AES128-CCM8               DH 2048    AESCCM8     128      TLS_DHE_RSA_WITH_AES_128_CCM_8
 xc09e   DHE-RSA-AES128-CCM                DH 2048    AESCCM      128      TLS_DHE_RSA_WITH_AES_128_CCM
 xc0a0   AES128-CCM8                       RSA        AESCCM8     128      TLS_RSA_WITH_AES_128_CCM_8
 xc09c   AES128-CCM                        RSA        AESCCM      128      TLS_RSA_WITH_AES_128_CCM
 x67     DHE-RSA-AES128-SHA256             DH 2048    AES         128      TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
 x33     DHE-RSA-AES128-SHA                DH 2048    AES         128      TLS_DHE_RSA_WITH_AES_128_CBC_SHA
 xc076   ECDHE-RSA-CAMELLIA128-SHA256      ECDH 253   Camellia    128      TLS_ECDHE_RSA_WITH_CAMELLIA_128_CBC_SHA256
 xbe     DHE-RSA-CAMELLIA128-SHA256        DH 2048    Camellia    128      TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA256
 x9a     DHE-RSA-SEED-SHA                  DH 2048    SEED        128      TLS_DHE_RSA_WITH_SEED_CBC_SHA
 x45     DHE-RSA-CAMELLIA128-SHA           DH 2048    Camellia    128      TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA
 xc018   AECDH-AES128-SHA                  ECDH 253   AES         128      TLS_ECDH_anon_WITH_AES_128_CBC_SHA
 xa6     ADH-AES128-GCM-SHA256             DH 2048    AESGCM      128      TLS_DH_anon_WITH_AES_128_GCM_SHA256
 x6c     ADH-AES128-SHA256                 DH 2048    AES         128      TLS_DH_anon_WITH_AES_128_CBC_SHA256
 x34     ADH-AES128-SHA                    DH 2048    AES         128      TLS_DH_anon_WITH_AES_128_CBC_SHA
 xbf     ADH-CAMELLIA128-SHA256            DH 2048    Camellia    128      TLS_DH_anon_WITH_CAMELLIA_128_CBC_SHA256
 x9b     ADH-SEED-SHA                      DH 2048    SEED        128      TLS_DH_anon_WITH_SEED_CBC_SHA
 x46     ADH-CAMELLIA128-SHA               DH 2048    Camellia    128      TLS_DH_anon_WITH_CAMELLIA_128_CBC_SHA
 x9c     AES128-GCM-SHA256                 RSA        AESGCM      128      TLS_RSA_WITH_AES_128_GCM_SHA256
 x3c     AES128-SHA256                     RSA        AES         128      TLS_RSA_WITH_AES_128_CBC_SHA256
 x2f     AES128-SHA                        RSA        AES         128      TLS_RSA_WITH_AES_128_CBC_SHA
 xba     CAMELLIA128-SHA256                RSA        Camellia    128      TLS_RSA_WITH_CAMELLIA_128_CBC_SHA256
 x96     SEED-SHA                          RSA        SEED        128      TLS_RSA_WITH_SEED_CBC_SHA
 x41     CAMELLIA128-SHA                   RSA        Camellia    128      TLS_RSA_WITH_CAMELLIA_128_CBC_SHA
 xc050   ARIA128-GCM-SHA256                RSA        ARIAGCM     128      TLS_RSA_WITH_ARIA_128_GCM_SHA256
 xc052   DHE-RSA-ARIA128-GCM-SHA256        DH 2048    ARIAGCM     128      TLS_DHE_RSA_WITH_ARIA_128_GCM_SHA256
 xc060   ECDHE-ARIA128-GCM-SHA256          ECDH 253   ARIAGCM     128      TLS_ECDHE_RSA_WITH_ARIA_128_GCM_SHA256
TLS 1.3
 x1302   TLS_AES_256_GCM_SHA384            ECDH 253   AESGCM      256      TLS_AES_256_GCM_SHA384
 x1303   TLS_CHACHA20_POLY1305_SHA256      ECDH 253   ChaCha20    256      TLS_CHACHA20_POLY1305_SHA256
 x1301   TLS_AES_128_GCM_SHA256            ECDH 253   AESGCM      128      TLS_AES_128_GCM_SHA256
With switching our SSL certificate to RSA, we gained around 51 new cipher options, amongst them being ones that also mx01.arz.at claimed to support. FTR, the result from above is what you get with the default settings for postfix v3.5.18, being:
smtpd_tls_ciphers = medium
smtpd_tls_mandatory_ciphers = medium
smtpd_tls_mandatory_exclude_ciphers =
smtpd_tls_mandatory_protocols = !SSLv2, !SSLv3
But the delay between triggering the password reset mail and getting a mail server connect was getting bigger and bigger. Therefore while waiting for the next mail to arrive, I decided to capture the network traffic, to be able to look further into this if it should continue to be failing:
% sudo tshark -n -i eth0 -s 65535 -w arz.pcap -f "host 193.110.182.61 or host 193.110.182.62"
A few hours later the mail server connected again, and the mail went through!
postfix/smtpd[4162835]: connect from mx01.arz.at[193.110.182.61]
postfix/smtpd[4162835]: Anonymous TLS connection established from mx01.arz.at[193.110.182.61]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)
postfix/smtpd[4162835]: E50D6401E6: client=mx01.arz.at[193.110.182.61]
postfix/smtpd[4162835]: disconnect from mx01.arz.at[193.110.182.61] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7
Now also having the captured network traffic, we can check the details there:
[...]
% tshark -o smtp.decryption:true -r arz.pcap
    1 0.000000000 193.110.182.61   203.0.113.42 TCP 74 24699   25 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2261106119 TSecr=0 WS=128
    2 0.000042827 203.0.113.42   193.110.182.61 TCP 74 25   24699 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM=1 TSval=3233422181 TSecr=2261106119 WS=128
    3 0.020719269 193.110.182.61   203.0.113.42 TCP 66 24699   25 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2261106139 TSecr=3233422181
    4 0.022883259 203.0.113.42   193.110.182.61 SMTP 96 S: 220 mail.example.com ESMTP
    5 0.043682626 193.110.182.61   203.0.113.42 TCP 66 24699   25 [ACK] Seq=1 Ack=31 Win=29312 Len=0 TSval=2261106162 TSecr=3233422203
    6 0.043799047 193.110.182.61   203.0.113.42 SMTP 84 C: EHLO mx01.arz.at
    7 0.043811363 203.0.113.42   193.110.182.61 TCP 66 25   24699 [ACK] Seq=31 Ack=19 Win=65280 Len=0 TSval=3233422224 TSecr=2261106162
    8 0.043898412 203.0.113.42   193.110.182.61 SMTP 253 S: 250-mail.example.com   PIPELINING   SIZE 20240000   VRFY   ETRN   AUTH PLAIN   AUTH=PLAIN   ENHANCEDSTATUSCODES   8BITMIME   DSN   SMTPUTF8   CHUNKING
    9 0.064625499 193.110.182.61   203.0.113.42 SMTP 72 C: QUIT
   10 0.064750257 203.0.113.42   193.110.182.61 SMTP 81 S: 221 2.0.0 Bye
   11 0.064760200 203.0.113.42   193.110.182.61 TCP 66 25   24699 [FIN, ACK] Seq=233 Ack=25 Win=65280 Len=0 TSval=3233422245 TSecr=2261106183
   12 0.085573715 193.110.182.61   203.0.113.42 TCP 66 24699   25 [FIN, ACK] Seq=25 Ack=234 Win=30336 Len=0 TSval=2261106204 TSecr=3233422245
   13 0.085610229 203.0.113.42   193.110.182.61 TCP 66 25   24699 [ACK] Seq=234 Ack=26 Win=65280 Len=0 TSval=3233422266 TSecr=2261106204
   14 1799.888108373 193.110.182.61   203.0.113.42 TCP 74 10330   25 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2262906007 TSecr=0 WS=128
   15 1799.888161311 203.0.113.42   193.110.182.61 TCP 74 25   10330 [SYN, ACK] Seq=0 Ack=1 Win=65160 Len=0 MSS=1460 SACK_PERM=1 TSval=3235222069 TSecr=2262906007 WS=128
   16 1799.909030335 193.110.182.61   203.0.113.42 TCP 66 10330   25 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2262906028 TSecr=3235222069
   17 1799.956621011 203.0.113.42   193.110.182.61 SMTP 96 S: 220 mail.example.com ESMTP
   18 1799.977229656 193.110.182.61   203.0.113.42 TCP 66 10330   25 [ACK] Seq=1 Ack=31 Win=29312 Len=0 TSval=2262906096 TSecr=3235222137
   19 1799.977229698 193.110.182.61   203.0.113.42 SMTP 84 C: EHLO mx01.arz.at
   20 1799.977266759 203.0.113.42   193.110.182.61 TCP 66 25   10330 [ACK] Seq=31 Ack=19 Win=65280 Len=0 TSval=3235222158 TSecr=2262906096
   21 1799.977351663 203.0.113.42   193.110.182.61 SMTP 267 S: 250-mail.example.com   PIPELINING   SIZE 20240000   VRFY   ETRN   STARTTLS   AUTH PLAIN   AUTH=PLAIN   ENHANCEDSTATUSCODES   8BITMIME   DSN   SMTPUTF8   CHUNKING
   22 1800.011494861 193.110.182.61   203.0.113.42 SMTP 76 C: STARTTLS
   23 1800.011589267 203.0.113.42   193.110.182.61 SMTP 96 S: 220 2.0.0 Ready to start TLS
   24 1800.032812294 193.110.182.61   203.0.113.42 TLSv1 223 Client Hello
   25 1800.032987264 203.0.113.42   193.110.182.61 TLSv1.2 2962 Server Hello
   26 1800.032995513 203.0.113.42   193.110.182.61 TCP 1266 25   10330 [PSH, ACK] Seq=3158 Ack=186 Win=65152 Len=1200 TSval=3235222214 TSecr=2262906151 [TCP segment of a reassembled PDU]
   27 1800.053546755 193.110.182.61   203.0.113.42 TCP 66 10330   25 [ACK] Seq=186 Ack=3158 Win=36096 Len=0 TSval=2262906172 TSecr=3235222214
   28 1800.092852469 193.110.182.61   203.0.113.42 TCP 66 10330   25 [ACK] Seq=186 Ack=4358 Win=39040 Len=0 TSval=2262906212 TSecr=3235222214
   29 1800.092892905 203.0.113.42   193.110.182.61 TLSv1.2 900 Certificate, Server Key Exchange, Server Hello Done
   30 1800.113546769 193.110.182.61   203.0.113.42 TCP 66 10330   25 [ACK] Seq=186 Ack=5192 Win=41856 Len=0 TSval=2262906232 TSecr=3235222273
   31 1800.114763363 193.110.182.61   203.0.113.42 TLSv1.2 192 Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message
   32 1800.115000416 203.0.113.42   193.110.182.61 TLSv1.2 117 Change Cipher Spec, Encrypted Handshake Message
   33 1800.136070200 193.110.182.61   203.0.113.42 TLSv1.2 113 Application Data
   34 1800.136155526 203.0.113.42   193.110.182.61 TLSv1.2 282 Application Data
   35 1800.158854473 193.110.182.61   203.0.113.42 TLSv1.2 162 Application Data
   36 1800.159254794 203.0.113.42   193.110.182.61 TLSv1.2 109 Application Data
   37 1800.180286407 193.110.182.61   203.0.113.42 TLSv1.2 144 Application Data
   38 1800.223005960 203.0.113.42   193.110.182.61 TCP 66 25   10330 [ACK] Seq=5502 Ack=533 Win=65152 Len=0 TSval=3235222404 TSecr=2262906299
   39 1802.230300244 203.0.113.42   193.110.182.61 TLSv1.2 146 Application Data
   40 1802.251994333 193.110.182.61   203.0.113.42 TCP 2962 [TCP segment of a reassembled PDU]
   41 1802.252034015 203.0.113.42   193.110.182.61 TCP 66 25   10330 [ACK] Seq=5582 Ack=3429 Win=63616 Len=0 TSval=3235224433 TSecr=2262908371
   42 1802.252279083 193.110.182.61   203.0.113.42 TLSv1.2 1295 Application Data
   43 1802.252288316 203.0.113.42   193.110.182.61 TCP 66 25   10330 [ACK] Seq=5582 Ack=4658 Win=64128 Len=0 TSval=3235224433 TSecr=2262908371
   44 1802.272816060 193.110.182.61   203.0.113.42 TLSv1.2 833 Application Data, Application Data
   45 1802.272827542 203.0.113.42   193.110.182.61 TCP 66 25   10330 [ACK] Seq=5582 Ack=5425 Win=64128 Len=0 TSval=3235224453 TSecr=2262908392
   46 1802.338807683 203.0.113.42   193.110.182.61 TLSv1.2 131 Application Data
   47 1802.398968611 193.110.182.61   203.0.113.42 TCP 66 10330   25 [ACK] Seq=5425 Ack=5647 Win=44800 Len=0 TSval=2262908518 TSecr=3235224519
   48 1863.257457500 193.110.182.61   203.0.113.42 TLSv1.2 101 Application Data
   49 1863.257495688 203.0.113.42   193.110.182.61 TCP 66 25   10330 [ACK] Seq=5647 Ack=5460 Win=64128 Len=0 TSval=3235285438 TSecr=2262969376
   50 1863.257654942 203.0.113.42   193.110.182.61 TLSv1.2 110 Application Data
   51 1863.257721010 203.0.113.42   193.110.182.61 TLSv1.2 97 Encrypted Alert
   52 1863.278242216 193.110.182.61   203.0.113.42 TCP 66 10330   25 [ACK] Seq=5460 Ack=5691 Win=44800 Len=0 TSval=2262969397 TSecr=3235285438
   53 1863.278464176 193.110.182.61   203.0.113.42 TCP 66 10330   25 [RST, ACK] Seq=5460 Ack=5723 Win=44800 Len=0 TSval=2262969397 TSecr=3235285438
% tshark -O tls -r arz.pcap
[...]
Transport Layer Security
    TLSv1 Record Layer: Handshake Protocol: Client Hello
        Content Type: Handshake (22)
        Version: TLS 1.0 (0x0301)
        Length: 152
        Handshake Protocol: Client Hello
            Handshake Type: Client Hello (1)
            Length: 148
            Version: TLS 1.2 (0x0303)
            Random: 4575d1e7c93c09a564edc00b8b56ea6f5d826f8cfe78eb980c451a70a9c5123f
                GMT Unix Time: Dec  5, 2006 21:09:11.000000000 CET
                Random Bytes: c93c09a564edc00b8b56ea6f5d826f8cfe78eb980c451a70a9c5123f
            Session ID Length: 0
            Cipher Suites Length: 26
            Cipher Suites (13 suites)
                Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (0xc030)
                Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (0xc02f)
                Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 (0xc028)
                Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (0xc027)
                Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (0xc014)
                Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (0xc013)
                Cipher Suite: TLS_RSA_WITH_AES_256_GCM_SHA384 (0x009d)
                Cipher Suite: TLS_RSA_WITH_AES_128_GCM_SHA256 (0x009c)
                Cipher Suite: TLS_RSA_WITH_AES_256_CBC_SHA256 (0x003d)
                Cipher Suite: TLS_RSA_WITH_AES_128_CBC_SHA256 (0x003c)
                Cipher Suite: TLS_RSA_WITH_AES_256_CBC_SHA (0x0035)
                Cipher Suite: TLS_RSA_WITH_AES_128_CBC_SHA (0x002f)
                Cipher Suite: TLS_EMPTY_RENEGOTIATION_INFO_SCSV (0x00ff)
[...]
Transport Layer Security
    TLSv1.2 Record Layer: Handshake Protocol: Server Hello
        Content Type: Handshake (22)
        Version: TLS 1.2 (0x0303)
        Length: 89
        Handshake Protocol: Server Hello
            Handshake Type: Server Hello (2)
            Length: 85
            Version: TLS 1.2 (0x0303)
            Random: cf2ed24e3300e95e5f56023bf8b4e5904b862bb2ed8a5796444f574e47524401
                GMT Unix Time: Feb 23, 2080 23:16:46.000000000 CET
                Random Bytes: 3300e95e5f56023bf8b4e5904b862bb2ed8a5796444f574e47524401
            Session ID Length: 32
            Session ID: 63d041b126ecebf857d685abd9d4593c46a3672e1ad76228f3eacf2164f86fb9
            Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (0xc030)
[...]
In this network dump we see what cipher suites are offered, and the TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 here is the Cipher Suite Name in IANA/RFC speak. Whis corresponds to the ECDHE-RSA-AES256-GCM-SHA384 in openssl speak (see Mozilla s Mozilla s cipher suite correspondence table), which we also saw in the postfix log. Mission accomplished! :) Now, if we re interested in avoiding certain ciphers and increase security level, we can e.g. get rid of the SEED, CAMELLIA and all anonymous ciphers, and could accept only TLS v1.2 + v1.3, by further adjusting postfix s main.cf:
smtpd_tls_ciphers = high
smtpd_tls_exclude_ciphers = aNULL CAMELLIA
smtpd_tls_mandatory_ciphers = high
smtpd_tls_mandatory_protocols = TLSv1.2 TLSv1.3
smtpd_tls_protocols = TLSv1.2 TLSv1.3
Which would then gives us:
% testssl --cipher-per-proto -t=smtp mail.example.com:25
[...]
Hexcode  Cipher Suite Name (OpenSSL)       KeyExch.   Encryption  Bits     Cipher Suite Name (IANA/RFC)
-----------------------------------------------------------------------------------------------------------------------------
SSLv2
SSLv3
TLS 1
TLS 1.1
TLS 1.2
 xc030   ECDHE-RSA-AES256-GCM-SHA384       ECDH 253   AESGCM      256      TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
 xc028   ECDHE-RSA-AES256-SHA384           ECDH 253   AES         256      TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
 xc014   ECDHE-RSA-AES256-SHA              ECDH 253   AES         256      TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
 x9f     DHE-RSA-AES256-GCM-SHA384         DH 2048    AESGCM      256      TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
 xcca8   ECDHE-RSA-CHACHA20-POLY1305       ECDH 253   ChaCha20    256      TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
 xccaa   DHE-RSA-CHACHA20-POLY1305         DH 2048    ChaCha20    256      TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256
 xc0a3   DHE-RSA-AES256-CCM8               DH 2048    AESCCM8     256      TLS_DHE_RSA_WITH_AES_256_CCM_8
 xc09f   DHE-RSA-AES256-CCM                DH 2048    AESCCM      256      TLS_DHE_RSA_WITH_AES_256_CCM
 x6b     DHE-RSA-AES256-SHA256             DH 2048    AES         256      TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
 x39     DHE-RSA-AES256-SHA                DH 2048    AES         256      TLS_DHE_RSA_WITH_AES_256_CBC_SHA
 x9d     AES256-GCM-SHA384                 RSA        AESGCM      256      TLS_RSA_WITH_AES_256_GCM_SHA384
 xc0a1   AES256-CCM8                       RSA        AESCCM8     256      TLS_RSA_WITH_AES_256_CCM_8
 xc09d   AES256-CCM                        RSA        AESCCM      256      TLS_RSA_WITH_AES_256_CCM
 x3d     AES256-SHA256                     RSA        AES         256      TLS_RSA_WITH_AES_256_CBC_SHA256
 x35     AES256-SHA                        RSA        AES         256      TLS_RSA_WITH_AES_256_CBC_SHA
 xc051   ARIA256-GCM-SHA384                RSA        ARIAGCM     256      TLS_RSA_WITH_ARIA_256_GCM_SHA384
 xc053   DHE-RSA-ARIA256-GCM-SHA384        DH 2048    ARIAGCM     256      TLS_DHE_RSA_WITH_ARIA_256_GCM_SHA384
 xc061   ECDHE-ARIA256-GCM-SHA384          ECDH 253   ARIAGCM     256      TLS_ECDHE_RSA_WITH_ARIA_256_GCM_SHA384
 xc02f   ECDHE-RSA-AES128-GCM-SHA256       ECDH 253   AESGCM      128      TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
 xc027   ECDHE-RSA-AES128-SHA256           ECDH 253   AES         128      TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
 xc013   ECDHE-RSA-AES128-SHA              ECDH 253   AES         128      TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
 x9e     DHE-RSA-AES128-GCM-SHA256         DH 2048    AESGCM      128      TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
 xc0a2   DHE-RSA-AES128-CCM8               DH 2048    AESCCM8     128      TLS_DHE_RSA_WITH_AES_128_CCM_8
 xc09e   DHE-RSA-AES128-CCM                DH 2048    AESCCM      128      TLS_DHE_RSA_WITH_AES_128_CCM
 xc0a0   AES128-CCM8                       RSA        AESCCM8     128      TLS_RSA_WITH_AES_128_CCM_8
 xc09c   AES128-CCM                        RSA        AESCCM      128      TLS_RSA_WITH_AES_128_CCM
 x67     DHE-RSA-AES128-SHA256             DH 2048    AES         128      TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
 x33     DHE-RSA-AES128-SHA                DH 2048    AES         128      TLS_DHE_RSA_WITH_AES_128_CBC_SHA
 x9c     AES128-GCM-SHA256                 RSA        AESGCM      128      TLS_RSA_WITH_AES_128_GCM_SHA256
 x3c     AES128-SHA256                     RSA        AES         128      TLS_RSA_WITH_AES_128_CBC_SHA256
 x2f     AES128-SHA                        RSA        AES         128      TLS_RSA_WITH_AES_128_CBC_SHA
 xc050   ARIA128-GCM-SHA256                RSA        ARIAGCM     128      TLS_RSA_WITH_ARIA_128_GCM_SHA256
 xc052   DHE-RSA-ARIA128-GCM-SHA256        DH 2048    ARIAGCM     128      TLS_DHE_RSA_WITH_ARIA_128_GCM_SHA256
 xc060   ECDHE-ARIA128-GCM-SHA256          ECDH 253   ARIAGCM     128      TLS_ECDHE_RSA_WITH_ARIA_128_GCM_SHA256
TLS 1.3
 x1302   TLS_AES_256_GCM_SHA384            ECDH 253   AESGCM      256      TLS_AES_256_GCM_SHA384
 x1303   TLS_CHACHA20_POLY1305_SHA256      ECDH 253   ChaCha20    256      TLS_CHACHA20_POLY1305_SHA256
 x1301   TLS_AES_128_GCM_SHA256            ECDH 253   AESGCM      128      TLS_AES_128_GCM_SHA256
Don t forget to also adjust the smpt_tls_* accordingly (for your sending side). For further information see the Postfix TLS Support documentation. Also check out options like tls_ssl_options (setting it to e.g. NO_COMPRESSION) and tls_preempt_cipherlist (setting it to yes would prefer the servers order of ciphers over clients). Conclusions:

21 August 2023

Melissa Wen: AMD Driver-specific Properties for Color Management on Linux (Part 1)

TL;DR: Color is a visual perception. Human eyes can detect a broader range of colors than any devices in the graphics chain. Since each device can generate, capture or reproduce a specific subset of colors and tones, color management controls color conversion and calibration across devices to ensure a more accurate and consistent color representation. We can expose a GPU-accelerated display color management pipeline to support this process and enhance results, and this is what we are doing on Linux to improve color management on Gamescope/SteamDeck. Even with the challenges of being external developers, we have been working on mapping AMD GPU color capabilities to the Linux kernel color management interface, which is a combination of DRM and AMD driver-specific color properties. This more extensive color management pipeline includes pre-defined Transfer Functions, 1-Dimensional LookUp Tables (1D LUTs), and 3D LUTs before and after the plane composition/blending.
The study of color is well-established and has been explored for many years. Color science and research findings have also guided technology innovations. As a result, color in Computer Graphics is a very complex topic that I m putting a lot of effort into becoming familiar with. I always find myself rereading all the materials I have collected about color space and operations since I started this journey (about one year ago). I also understand how hard it is to find consensus on some color subjects, as exemplified by all explanations around the 2015 online viral phenomenon of The Black and Blue Dress. Have you heard about it? What is the color of the dress for you? So, taking into account my skills with colors and building consensus, this blog post only focuses on GPU hardware capabilities to support color management :-D If you want to learn more about color concepts and color on Linux, you can find useful links at the end of this blog post.

Linux Kernel, show me the colors ;D DRM color management interface only exposes a small set of post-blending color properties. Proposals to enhance the DRM color API from different vendors have landed the subsystem mailing list over the last few years. On one hand, we got some suggestions to extend DRM post-blending/CRTC color API: DRM CRTC 3D LUT for R-Car (2020 version); DRM CRTC 3D LUT for Intel (draft - 2020); DRM CRTC 3D LUT for AMD by Igalia (v2 - 2023); DRM CRTC 3D LUT for R-Car (v2 - 2023). On the other hand, some proposals to extend DRM pre-blending/plane API: DRM plane colors for Intel (v2 - 2021); DRM plane API for AMD (v3 - 2021); DRM plane 3D LUT for AMD - 2021. Finally, Simon Ser sent the latest proposal in May 2023: Plane color pipeline KMS uAPI, from discussions in the 2023 Display/HDR Hackfest, and it is still under evaluation by the Linux Graphics community. All previous proposals seek a generic solution for expanding the API, but many seem to have stalled due to the uncertainty of matching well the hardware capabilities of all vendors. Meanwhile, the use of AMD color capabilities on Linux remained limited by the DRM interface, as the DCN 3.0 family color caps and mapping diagram below shows the Linux/DRM color interface without driver-specific color properties [*]: Bearing in mind that we need to know the variety of color pipelines in the subsystem to be clear about a generic solution, we decided to approach the issue from a different perspective and worked on enabling a set of Driver-Specific Color Properties for AMD Display Drivers. As a result, I recently sent another round of the AMD driver-specific color mgmt API. For those who have been following the AMD driver-specific proposal since the beginning (see [RFC][V1]), the main new features of the latest version [v2] are the addition of pre-blending Color Transformation Matrix (plane CTM) and the differentiation of Pre-defined Transfer Functions (TF) supported by color blocks. For those who just got here, I will recap this work in two blog posts. This one describes the current status of the AMD display driver in the Linux kernel/DRM subsystem and what changes with the driver-specific properties. In the next post, we go deeper to describe the features of each color block and provide a better picture of what is available in terms of color management for Linux.

The Linux kernel color management API and AMD hardware color capabilities Before discussing colors in the Linux kernel with AMD hardware, consider accessing the Linux kernel documentation (version 6.5.0-rc5). In the AMD Display documentation, you will find my previous work documenting AMD hardware color capabilities and the Color Management Properties. It describes how AMD Display Manager (DM) intermediates requests between the AMD Display Core component (DC) and the Linux/DRM kernel interface for color management features. It also describes the relevant function to call the AMD color module in building curves for content space transformations. A subsection also describes hardware color capabilities and how they evolve between versions. This subsection, DC Color Capabilities between DCN generations, is a good starting point to understand what we have been doing on the kernel side to provide a broader color management API with AMD driver-specific properties.

Why do we need more kernel color properties on Linux? Blending is the process of combining multiple planes (framebuffers abstraction) according to their mode settings. Before blending, we can manage the colors of various planes separately; after blending, we have combined those planes in only one output per CRTC. Color conversions after blending would be enough in a single-plane scenario or when dealing with planes in the same color space on the kernel side. Still, it cannot help to handle the blending of multiple planes with different color spaces and luminance levels. With plane color management properties, userspace can get a more accurate representation of colors to deal with the diversity of color profiles of devices in the graphics chain, bring a wide color gamut (WCG), convert High-Dynamic-Range (HDR) content to Standard-Dynamic-Range (SDR) content (and vice-versa). With a GPU-accelerated display color management pipeline, we can use hardware blocks for color conversions and color mapping and support advanced color management. The current DRM color management API enables us to perform some color conversions after blending, but there is no interface to calibrate input space by planes. Note that here I m not considering some workarounds in the AMD display manager mapping of DRM CRTC de-gamma and DRM CRTC CTM property to pre-blending DC de-gamma and gamut remap block, respectively. So, in more detail, it only exposes three post-blending features:
  • DRM CRTC de-gamma: used to convert the framebuffer s colors to linear gamma;
  • DRM CRTC CTM: used for color space conversion;
  • DRM CRTC gamma: used to convert colors to the gamma space of the connected screen.

AMD driver-specific color management interface We can compare the Linux color management API with and without the driver-specific color properties. From now, we denote driver-specific properties with the AMD prefix and generic properties with the DRM prefix. For visual comparison, I bring the DCN 3.0 family color caps and mapping diagram closer and present it here again: Mixing AMD driver-specific color properties with DRM generic color properties, we have a broader Linux color management system with the following features exposed by properties in the plane and CRTC interface, as summarized by this updated diagram: The blocks highlighted by red lines are the new properties in the driver-specific interface developed by me (Igalia) and Joshua (Valve). The red dashed lines are new links between API and AMD driver components implemented by us to connect the Linux/DRM interface to AMD hardware blocks, mapping components accordingly. In short, we have the following color management properties exposed by the DRM/AMD display driver:
  • Pre-blending - AMD Display Pipe and Plane (DPP):
    • AMD plane de-gamma: 1D LUT and pre-defined transfer functions; used to linearize the input space of a plane;
    • AMD plane CTM: 3x4 matrix; used to convert plane color space;
    • AMD plane shaper: 1D LUT and pre-defined transfer functions; used to delinearize and/or normalize colors before applying 3D LUT;
    • AMD plane 3D LUT: 17x17x17 size with 12 bit-depth; three dimensional lookup table used for advanced color mapping;
    • AMD plane blend/out gamma: 1D LUT and pre-defined transfer functions; used to linearize back the color space after 3D LUT for blending.
  • Post-blending - AMD Multiple Pipe/Plane Combined (MPC):
    • DRM CRTC de-gamma: 1D LUT (can t be set together with plane de-gamma);
    • DRM CRTC CTM: 3x3 matrix (remapped to post-blending matrix);
    • DRM CRTC gamma: 1D LUT + AMD CRTC gamma TF; added to take advantage of driver pre-defined transfer functions;
Note: You can find more about AMD display blocks in the Display Core Next (DCN) - Linux kernel documentation, provided by Rodrigo Siqueira (Linux/AMD display developer) in a 2021-documentation series. In the next post, I ll revisit this topic, explaining display and color blocks in detail.

How did we get a large set of color features from AMD display hardware? So, looking at AMD hardware color capabilities in the first diagram, we can see no post-blending (MPC) de-gamma block in any hardware families. We can also see that the AMD display driver maps CRTC/post-blending CTM to pre-blending (DPP) gamut_remap, but there is post-blending (MPC) gamut_remap (DRM CTM) from newer hardware versions that include SteamDeck hardware. You can find more details about hardware versions in the Linux kernel documentation/AMDGPU Product Information. I needed to rework these two mappings mentioned above to provide pre-blending/plane de-gamma and CTM for SteamDeck. I changed the DC mapping to detach stream gamut remap matrixes from the DPP gamut remap block. That means mapping AMD plane CTM directly to DPP/pre-blending gamut remap block and DRM CRTC CTM to MPC/post-blending gamut remap block. In this sense, I also limited plane CTM properties to those hardware versions with MPC/post-blending gamut_remap capabilities since older versions cannot support this feature without clashes with DRM CRTC CTM. Unfortunately, I couldn t prevent conflict between AMD plane de-gamma and DRM plane de-gamma since post-blending de-gamma isn t available in any AMD hardware versions until now. The fact is that a post-blending de-gamma makes little sense in the AMD color pipeline, where plane blending works better in a linear space, and there are enough color blocks to linearize content before blending. To deal with this conflict, the driver now rejects atomic commits if users try to set both AMD plane de-gamma and DRM CRTC de-gamma simultaneously. Finally, we had no other clashes when enabling other AMD driver-specific color properties for our use case, Gamescope/SteamDeck. Our main work for the remaining properties was understanding the data flow of each property, the hardware capabilities and limitations, and how to shape the data for programming the registers - AMD color block capabilities (and limitations) are the topics of the next blog post. Besides that, we fixed some driver bugs along the way since it was the first Linux use case for most of the new color properties, and some behaviors are only exposed when exercising the engine. Take a look at the Gamescope/Steam Deck Color Pipeline[**], and see how Gamescope uses the new API to manage color space conversions and calibration (please click on the image for a better view): In the next blog post, I ll describe the implementation and technical details of each pre- and post-blending color block/property on the AMD display driver. * Thank Harry Wentland for helping with diagrams, color concepts and AMD capabilities. ** Thank Joshua Ashton for providing and explaining Gamescope/Steam Deck color pipeline. *** Thanks to the Linux Graphics community - explicitly Harry, Joshua, Pekka, Simon, Sebastian, Siqueira, Alex H. and Ville - to all the learning during this Linux DRM/AMD color journey. Also, Carlos and Tomas for organizing the 2023 Display/HDR Hackfest where we have a great and immersive opportunity to discuss Color & HDR on Linux.

4 August 2023

Reproducible Builds: Reproducible Builds in July 2023

Welcome to the July 2023 report from the Reproducible Builds project. In our reports, we try to outline the most important things that we have been up to over the past month. As ever, if you are interested in contributing to the project, please visit the Contribute page on our website.
Marcel Fourn et al. presented at the IEEE Symposium on Security and Privacy in San Francisco, CA on The Importance and Challenges of Reproducible Builds for Software Supply Chain Security. As summarised in last month s report, the abstract of their paper begins:
The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created. (PDF)

Chris Lamb published an interview with Simon Butler, associate senior lecturer in the School of Informatics at the University of Sk vde, on the business adoption of Reproducible Builds. (This is actually the seventh instalment in a series featuring the projects, companies and individuals who support our project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project and David A. Wheeler.) Vagrant Cascadian presented Breaking the Chains of Trusting Trust at FOSSY 2023.
Rahul Bajaj has been working with Roland Clobus on merging an overview of environment variations to our website:
I have identified 16 root causes for unreproducible builds in my empirical study, which I have linked to the corresponding documentation. The initial MR right now contains information about 10 root causes. For each root cause, I have provided a definition, a notable instance, and a workaround. However, I have only found workarounds for 5 out of the 10 root causes listed in this merge request. In the upcoming commits, I plan to add an additional 6 root causes. I kindly request you review the text for any necessary refinements, modifications, or corrections. Additionally, I would appreciate the help with documentation for the solutions/workarounds for the remaining root causes: Archive Metadata, Build ID, File System Ordering, File Permissions, and Snippet Encoding. Your input on the identified root causes for unreproducible builds would be greatly appreciated. [ ]

Just a reminder that our upcoming Reproducible Builds Summit is set to take place from October 31st November 2nd 2023 in Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. If you re interested in joining us this year, please make sure to read the event page which has more details about the event and location.
There was more progress towards making the Go programming language ecosystem reproducible this month, including: In addition, kpcyrd posted to our mailing list to report that:
while packaging govulncheck for Arch Linux I noticed a checksum mismatch for a tar file I downloaded from go.googlesource.com. I used diffoscope to compare the .tar file I downloaded with the .tar file the build server downloaded, and noticed the timestamps are different.

In Debian, 20 reviews of Debian packages were added, 25 were updated and 25 were removed this month adding to our knowledge about identified issues. A number of issue types were updated, including marking ffile_prefix_map_passed_to_clang being fixed since Debian bullseye [ ] and adding a Debian bug tracker reference for the nondeterminism_added_by_pyqt5_pyrcc5 issue [ ]. In addition, Roland Clobus posted another detailed update of the status of reproducible Debian ISO images on our mailing list. In particular, Roland helpfully summarised that live images are looking good, and the number of (passing) automated tests is growing .
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.
F-Droid added 20 new reproducible apps in July, making 165 apps in total that are published with Reproducible Builds and using the upstream developer s signature. [ ]
The Sphinx documentation tool recently accepted a change to improve deterministic reproducibility of documentation. It s internal util.inspect.object_description attempts to sort collections, but this can fail. The change handles the failure case by using string-based object descriptions as a fallback deterministic sort ordering, as well as adding recursive object-description calls for list and tuple datatypes. As a result, documentation generated by Sphinx will be more likely to be automatically reproducible. Lastly in news, kpcyrd posted to our mailing list announcing a new repro-env tool:
My initial interest in reproducible builds was how do I distribute pre-compiled binaries on GitHub without people raising security concerns about them . I ve cycled back to this original problem about 5 years later and built a tool that is meant to address this. [ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
In diffoscope development this month, versions 244, 245 and 246 were uploaded to Debian unstable by Chris Lamb, who also made the following changes:
  • Don t include the file size in image metadata. It is, at best, distracting, and it is already in the directory metadata. [ ]
  • Add compatibility with libarchive-5. [ ]
  • Mark that the test_dex::test_javap_14_differences test requires the procyon tool. [ ]
  • Initial work on DOS/MBR extraction. [ ]
  • Move to using assert_diff in the .ico and .jpeg tests. [ ]
  • Temporarily mark some Android-related as XFAIL due to Debian bugs #1040941 & #1040916. [ ]
  • Fix the test skipped reason generation in the case of a version outside of the required range. [ ]
  • Update copyright years. [ ][ ]
  • Fix try.diffoscope.org. [ ]
In addition, Gianfranco Costamagna added support for LLVM version 16. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In July, a number of changes were made by Holger Levsen:
  • General changes:
    • Upgrade Jenkins host to Debian bookworm now that Debian 12.1 is out. [ ][ ][ ][ ]
    • djm: improve UX when rebooting a node fails. [ ]
    • djm: reduce wait time between rebooting nodes. [ ]
  • Debian-related changes:
    • Various refactoring of the Debian scheduler. [ ][ ][ ]
    • Make Debian live builds more robust with respect to salsa.debian.org returning HTTP 502 errors. [ ][ ]
    • Use the legacy SCP protocol instead of the SFTP protocol when transfering Debian live builds. [ ][ ]
    • Speed up a number of database queries thanks, Myon! [ ][ ][ ][ ][ ]
    • Split create_meta_pkg_sets job into two (for Debian unstable and Debian testing) to half the job runtime to approximately 90 minutes. [ ][ ]
    • Split scheduler job into four separate jobs, one for each tested architecture. [ ][ ]
    • Treat more PostgreSQL errors as serious (for some jobs). [ ]
    • Re-enable automatic database documentation now that postgresql_autodoc is back in Debian bookworm. [ ]
    • Remove various hardcoding of Debian release names. [ ]
    • Drop some i386 special casing. [ ]
  • Other distributions:
    • Speed up Alpine SQL queries. [ ]
    • Adjust CSS layout for Arch Linux pages to match 3 and not 4 repos being tested. [ ]
    • Drop the community Arch Linux repo as it has now been merged into the extra repo. [ ]
    • Speed up a number of Arch-related database queries. [ ]
    • Try harder to properly cleanup after building OpenWrt packages. [ ]
    • Drop all kfreebsd-related tests now that it s officially dead. [ ]
  • System health:
    • Always ignore some well-known harmless orphan processes. [ ][ ][ ]
    • Detect another case of job failure due to Jenkins shutdown. [ ]
    • Show all non co-installable package sets on the status page. [ ]
    • Warn that some specific reboot nodes are currently false-positives. [ ]
  • Node health checks:
    • Run system and node health checks for Jenkins less frequently. [ ]
    • Try to restart any failed dpkg-db-backup [ ] and munin-node services [ ].
In addition, Vagrant Cascadian updated the paths in our automated to tests to use the same paths used by the official Debian build servers. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

1 August 2023

Reproducible Builds: Supporter spotlight: Simon Butler on business adoption of Reproducible Builds

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the seventh instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project and David A. Wheeler. Today, however, we will be talking with Simon Butler, an associate senior lecturer in the School of Informatics at the University of Sk vde, where he undertakes research in software engineering that focuses on IoT and open source software, and contributes to the teaching of computer science to undergraduates.

Chris: For those who have not heard of it before, can you tell us more about the School of Informatics at Sk vde University? Simon: Certainly, but I may be a little long-winded. Sk vde is a city in the area between the two large lakes in southern Sweden. The city is a busy place. Sk vde is home to the regional hospital, some of Volvo s manufacturing facilities, two regiments of the Swedish defence force, a lot of businesses in the Swedish computer games industry, other tech companies and more. The University of Sk vde is relatively small. Sweden s large land area and low population density mean that regional centres such as Sk vde are important and local universities support businesses by training new staff and supporting innovation. The School of Informatics has two divisions. One focuses on teaching and researching computer games. The other division encompasses a wider range of teaching and research, including computer science, web development, computer security, network administration, data science and so on.
Chris: You recently had a open-access paper published in Software Quality Journal. Could you tell us a little bit more about it and perhaps briefly summarise its key findings? Simon: The paper is one output of a collaborative research project with six Swedish businesses that use open source software. There are two parts to the paper. The first consists of an analysis of what the group of businesses in the project know about Reproducible Builds (R-Bs), their experiences with R-Bs and their perception of the value of R-Bs to the businesses. The second part is an interview study with business practitioners and others with experience and expertise in R-Bs. We set out to try to understand the extent to which software-intensive businesses were aware of R-Bs, the technical and business reasons they were or were not using R-Bs and to document the business and technical use cases for R-Bs. The key findings were that businesses are aware of R-Bs, and some are using R-Bs as part of their day-to-day development process. Some of the uses for R-Bs we found were not previously documented. We also found that businesses understood the value R-Bs have as part of engineering and software quality processes. They are also aware of the costs of implementing R-Bs and that R-Bs are an intangible value proposition - in other words, businesses can add value through process improvement by using R-Bs. But, that, currently at least, R-Bs are not a selling point for software or products.
Chris: You performed a large number of interviews in order to prepare your paper. What was the most surprising response to you? Simon: Most surprising is a good question. Everybody I spoke to brought something new to my understanding of R-Bs, and many responses surprised me. The interviewees that surprised me most were I01 and I02 (interviews were anonymised and interviewees were assigned numeric identities). I02 described the sceptical perspective that there is a viable, pragmatic alternative to R-Bs - verifiable builds - which I was aware of before undertaking the research. The company had developed a sufficiently robust system for their needs and worked well. With a large archive of software used in production, they couldn t justify the cost of retrofitting a different solution that might only offer small advantages over the existing system. Doesn t really sound too surprising, but the interview was one of the first I did on this topic, and I was very focused on the value of, and need for, trust in a system that motivated the R-B. The solution used by the company requires trust, but they seem to have established sufficient trust for their needs by securing their build systems to the extent that they are more or less tamper-proof. The other big surprise for me was I01 s use of R-Bs to support the verification of system configuration in a system with multiple embedded components at boot time. It s such an obvious application of R-Bs, and exactly the kind of response I hoped to get from interviewees. However, it is another instance of a solution where trust is only one factor. In the first instance, the developer is using R-Bs to establish trust in the toolchain. There is also the second application that the developer can use a set of R-Bs to establish that deployed system consists of compatible components. While this might not sound too significant, there appear to be some important potential applications. One that came to mind immediately is a problem with firmware updates on nodes in IoT systems where the node needs to update quickly with limited downtime and without failure. The node also needs to be able to roll back any update proposed by a server if there are conflicts with the current configuration or if any tests on the node fail. Perhaps the chances of failure could be reduced, if a node can instead negotiate with a server to determine a safe path to migrate from its current configuration to a working configuration with the upgraded components the central system requires? Another potential application appears to be in the configuration management of AI systems, where decisions need to be explainable. A means of specifying validated configurations of training data, models and deployed systems might, perhaps, be leveraged to prevent invalid or broken configurations from being deployed in production.
Chris: One of your findings was that reproducible builds were perceived to be good engineering practice . To what extent do you believe cultural forces affect the adoption or rejection of a given technology or practice? Simon: To a large extent. People s decisions are informed by cultural norms, and business decisions are made by people acting collectively. Of course, decision-making, including assessments of risk and usefulness, is mediated by individual positions on the continuum from conformity to non-conformity, as well as individual and in-group norms. Whether a business will consider a given technology for adoption will depend on cultural forces. The decision to adopt may well be made on the grounds of cost and benefits.
Chris: Another conclusion implied by your research is that businesses are often dealing with software deployment lifespans (eg. 20+ years) that differ from widely from those of the typical hobbyist programmer. To what degree do you think this temporal mismatch is a problem for both groups? Simon: This is a fascinating question. Long-term software maintenance is a requirement in some industries because of the working lifespans of the products and legal requirements to maintain the products for a fixed period. For some other industries, it is less of a problem. Consequently, I would tend to divide developers into those who have been exposed to long-term maintenance problems and those who have not. Although, more professional than hobbyist developers will have been exposed to the problem. Nonetheless, there are areas, such as music software, where there are also long-term maintenance challenges for data formats and software.
Chris: Based on your research, what would you say are the biggest blockers for the adoption of reproducible builds within business ? And, based on this, would you have any advice or recommendations for the broader reproducible builds ecosystem? Simon: From the research, the main blocker appears to be cost. Not an absolute cost, but there is an overhead to introducing R-Bs. Businesses (and thus business managers) need to understand the business case for R-Bs. Making decision-makers in businesses aware of R-Bs and that they are valuable will take time. Advocacy at multiple levels appears to be the way forward and this is being done. I would recommend being persistent while being patient and to keep talking about reproducible builds. The work done in Linux distributions raises awareness of R-Bs amongst developers. Guix, NixOS and Software Heritage are all providing practical solutions and getting attention - I ve been seeing progressively more mentions of all three during the last couple of years. Increased awareness amongst developers should lead to more interest within companies. There is also research money being assigned to supply chain security and R-B s. The CHAINS project at KTH in Stockholm is one example of a strategic research project. There may be others that I m not aware of. The policy-level advocacy is slowly getting results in some countries, and where CISA leads, others may follow.
Chris: Was there a particular reason you alighted on the question of the adoption of reproducible builds in business? Do you think there s any truth behind the shopworn stereotype of hacker types neglecting the resources that business might be able to offer? Simon: Much of the motivation for the research came from the contrast between the visibility of R-Bs in open source projects and the relative invisibility of R-Bs in industry. Where companies are known to be using R-Bs (e.g. Google, etc.) there is no fuss, no hype. They were not selling R-Bs as a solution; instead the documentation is very matter-of-fact that R-Bs are part of a customer-facing process in their cloud solutions. An obvious question for me was that if some people use R-B s in software development, why doesn t everybody? There are limits to the tooling for some programming languages that mean R-Bs are difficult or impossible. But where creating an R-B is practical, why are they not used more widely? So, to your second question. There is another factor, which seems to be more about a lack of communication rather than neglecting opportunities. Businesses may not always be willing to discuss their development processes and innovations. Though I do think the increasing number of conferences (big and small) for software practitioners is helping to facilitate more communication and greater exchange of ideas.
Chris: Has your personal view of reproducible builds changed since before you embarked on writing this paper? Simon: Absolutely! In the early stages of the research, I was interested in questions of trust and how R-Bs were applied to resolve build and supply chain security problems. As the research developed, however, I started to see there were benefits to the use of R-Bs that were less obvious and that, in some cases, an R-B can have more than a single application.
Chris: Finally, do you have any plans to do future research touching on reproducible builds? Simon: Yes, definitely. There are a set of problems that interest me. One already mentioned is the use of reproducible builds with AI systems. Interpretable or explainable AI (XAI) is a necessity, and I think that R-Bs can be used to support traceability in the configuration and testing of both deployed systems and systems used during model training and evaluation. I would also like to return to a problem discussed briefly in the article, which is to develop a deeper understanding of the elements involved in the application of R-Bs that can be used to support reasoning about existing and potential applications of R-Bs. For example, R-Bs can be used to establish trust for different groups of individuals at different times, say, between remote developers prior to the release of software and by users after release. One question is whether when an R-B is used might be a significant factor. Another group of questions concerns the ways in which trust (of some sort) propagates among users of an R-B. There is an example in the paper of a company that rebuilds Debian reproducibly for security reasons and is then able to collaborate on software projects where software is built reproducibly with other companies that use public distributions of Debian.
Chris: Many thanks for this interview, Simon. If someone wanted to get in touch or learn more about you and your colleagues at the School of Informatics, where might they go? Thank you for the opportunity. It has been a pleasure to reflect a little more widely on the research! Personally, you can find out about my work on my official homepage and on my personal site. The software systems research group (SSRG) has a website, and the University of Sk vde s English language pages are also available. Chris: Many thanks for this interview, Simon!


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

26 July 2023

Shirish Agarwal: Manipur Violence, Drugs, Binging on Northshore, Alaska Daily, Doogie Kamealoha and EU Digital Resilence Act.

Manipur Videos Warning: The text might be mature and will have references to violence so if there are kids or you are sensitive, please excuse. Few days back, saw the videos and I cannot share the rage, shame and many conflicting emotions that were going through me. I almost didn t want to share but couldn t stop myself. The woman in the video were being palmed, fingered, nude, later reportedly raped and murdered. And there have been more than a few cases. The next day saw another video that showed beheaded heads, and Kukis being killed just next to their houses. I couldn t imagine what those people must be feeling as the CM has been making partisan statements against them. One of the husbands of the Kuki women who had been paraded, fondled is an Army Officer in the Indian Army. The Meiteis even tried to burn his home but the Army intervened and didn t let it get burnt. The CM s own statement as shared before tells his inability to bring the situation out of crisis. In fact, his statement was dumb stating that the Internet shutdown was because there were more than 100 such cases. And it s spreading to the nearby Northeast regions. Now Mizoram, the nearest neighbor is going through similar things where the Meitis are not dominant. The Mizos have told the Meitis to get out. To date, the PM has chosen not to visit Manipur. He just made a small 1 minute statement about it saying how the women have shamed India, an approximation of what he said.While it s actually not the women but the men who have shamed India. The Wire has been talking to both the Meitis, the Kukis, the Nagas. A Kuki women sort of bared all. She is right on many counts. The GOI while wanting to paint the Kukis in a negative light have forgotten what has been happening in its own state, especially its own youth as well as in other states while also ignoring the larger geopolitics and business around it. Taliban has been cracking as even they couldn t see young boys, women becoming drug users. I had read somewhere that 1 in 4 or 1 in 5 young person in Afghanistan is now in its grip. So no wonder,the Taliban is trying to eradicate and shutdown drug use among it s own youth. Circling back to Manipur, I was under the wrong impression that the Internet shutdown is now over. After those videos became viral as well as the others I mentioned, again the orders have been given and there is shutdown. It is not fully shut but now only Govt. offices have it. so nobody can share a video that goes against any State or Central Govt. narrative  A real sad state of affairs  Update: There is conditional reopening whatever that means  When I saw the videos, the first thing is I felt was being powerless, powerless to do anything about it. The second was if I do not write about it, amplify it and don t let others know about it then what s the use of being able to blog

Mental Health, Binging on various Webseries Both the videos shocked me and I couldn t sleep that night or the night after. it. Even after doing work and all, they would come in unobtrusively in my nightmares  While I felt a bit foolish, I felt it would be nice to binge on some webseries. Little I was to know that both Northshore and Alaska Daily would have stories similar to what is happening here  While the story in Alaska Daily is fictional it resembles very closely to a real newspaper called Anchorage Daily news. Even there the Intuit women , one of the marginalized communities in Alaska. The only difference I can see between GOI and the Alaskan Government is that the Alaskan Government was much subtle in doing the same things. There are some differences though. First, the State is and was responsive to the local press and apart from one close call to one of its reporters, most reporters do not have to think about their own life in peril. Here, the press cannot look after either their livelihood or their life. It was a juvenile kid who actually shot the video, uploaded and made it viral. One needs to just remember the case details of Siddique Kappan. Just for sharing the news and the video he was arrested. Bail was denied to him time and time again citing that the Police were investigating . Only after 2 years and 3 months he got bail and that too because none of the charges that the Police had they were able to show any prima facie evidence. One of the better interviews though was of Vrinda Grover. For those who don t know her, her Wikipedia page does tell a bit about her although it is woefully incomplete. For example, most recently she had relentlessly pursued the unconstitutional Internet Shutdown that happened in Kashmir for 5 months. Just like in Manipur, the shutdown was there to bury crimes either committed or being facilitated by the State. For the issues of livelihood, one can take the cases of Bipin Yadav and Rashid Hussain. Both were fired by their employer Dainik Bhaskar because they questioned the BJP MP Smriti Irani what she has done for the state. The problems for Dainik Bhaskar or for any other mainstream media is most of them rely on Government advertisements. Private investment in India has fallen to record lows mostly due to the policies made by the Centre. If any entity or sector grows a bit then either Adani or Ambani will one way or the other take it. So, for most first and second generation entrepreneurs it doesn t make sense to grow and then finally sell it to one of these corporates at a loss  GOI on Adani, Ambani side of any deal. The MSME sector that is and used to be the second highest employer hasn t been able to recover from the shocks of demonetization, GST and then the pandemic. Each resulting in more and more closures and shutdowns. Most of the joblessness has gone up tremendously in North India which the Government tries to deny. The most interesting points in all those above examples is within a month or less, whatever the media reports gets scrubbed. Even the firing of the journos that was covered by some of the mainstream media isn t there anymore. I have to use secondary sources instead of primary sources. One can think of the chilling effects on reportage due to the above. The sad fact is even with all the money in the world the PM is unable to come to the Parliament to face questions.
Why is PM not answering in Parliament,, even Rahul Gandhi is not there - Surya Pratap Singh, prev. IAS Officer.
The above poster/question is by Surya Pratap Singh, a retired IAS officer. He asks why the PM is unable to answer in either of the houses. As shared before, the Govt. wants very limited discussion. Even yesterday, the Lok Sabha TV just showed the BJP MP s making statements but silent or mic was off during whatever questions or statements made by the opposition. If this isn t mockery of Indian democracy then I don t know what is  Even the media landscape has been altered substantially within the last few years. Both Adani and Ambani have distributed the media pie between themselves. One of the last bastions of the free press, NDTV was bought by Adani in a hostile takeover. Both Ambani and Adani are close to this Goverment. In fact, there is no sector in which one or the other is not present. Media houses like Newsclick, The Wire etc. that are a fraction of mainstream press are where most of the youth have been going to get their news as they are not partisan. Although even there, GOI has time and again interfered. The Wire has had too many 504 Gateway timeouts in the recent months and they had been forced to move most of their journalism from online to video, rather Youtube in order to escape both the censoring and the timeouts as shared above. In such a hostile environment, how both the organizations are somehow able to survive is a miracle. Most local reportage is also going to YouTube as that s the best way for them to not get into Govt. censors. Not an ideal situation, but that s the way it is. The difference between Indian and Israeli media can be seen through this
The above is a Screenshot shared by how the Israeli media has reacted to the Israeli Government s Knesset over the judicial overhaul . Here, the press itself erodes its own by giving into the Government day and night

Binging on Webseries Saw Northshore, Three Pines, Alaska Daily and Doogie Kamealoha M.D. which is based on Doogie Howser M.D. Of the four, enjoyed Doogie Kamealoha M.D. the most but then it might be because it s a copy of Doogie Howser, just updated to the new millenia and there are some good childhood memories associated with that series. The others are also good. I tried to not see European stuff as most of them are twisted and didn t want that space.

EU Digital Operational Resilience Act and impact on FOSS Few days ago, apparently the EU shared the above Act. One can read about it more here. This would have more impact on FOSS as most development of various FOSS distributions happens in EU. Fair bit of Debian s own development happens in Germany and France. While there have been calls to make things more clearer, especially for FOSS given that most developers do foss development either on side or as a hobby while their day job is and would be different. The part about consumer electronics and FOSS is a tricky one as updates can screw up your systems. Microsoft has had a huge history of devices not working after an update or upgrade. And this is not limited to Windows as they would like to believe. Even apple seems to be having its share of issues time and time again. One would have hoped that these companies that make billions of dollars from their hardware and software sales would be doing more testing and Q&A and be more aware about security issues. FOSS, on the other hand while being more responsive doesn t make as much money vis-a-vis the competitors. Let s take the most concrete example. The most successful mobile phone having FOSS is Purism. But it s phone, it has priced itself out of the market. A huge part of that is to do with both economies of scale and trying to get an infrastructure and skills in the States where none or minimally exists. Compared that to say Pinepro that is manufactured in Hong Kong and is priced 1/3rd of the same. For most people it is simply not affordable in these times. Add to that the complexity of these modern cellphones make it harder, not easier for most people to be vigilant and update the phone at all times. Maybe we need more dumphones such as Light and Punkt but then can those be remotely hacked or not, there doesn t seem to be any answers on that one. I haven t even seen anybody even ask those questions. They may have their own chicken and egg issues. For people like me who have lost hearing, while I can navigate smartphones for now but as I become old I don t see anything that would help me. For many an elderly population, both hearing and seeing are the first to fade. There doesn t seem to be any solutions targeted for them even though they are 5-10% of any population at the very least. Probably more so in Europe and the U.S. as well as Japan and China. All of them are clearly under-served markets but dunno a solution for them. At least to me that s an open question.

12 July 2023

Reproducible Builds: Reproducible Builds in June 2023

Welcome to the June 2023 report from the Reproducible Builds project In our reports, we outline the most important things that we have been up to over the past month. As always, if you are interested in contributing to the project, please visit our Contribute page on our website.


We are very happy to announce the upcoming Reproducible Builds Summit which set to take place from October 31st November 2nd 2023, in the vibrant city of Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim is to create an inclusive space that fosters collaboration, innovation and problem-solving. We are thrilled to host the seventh edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin and Athens. If you re interesting in joining us this year, please make sure to read the event page] which has more details about the event and location. (You may also be interested in attending PackagingCon 2023 held a few days before in Berlin.)
This month, Vagrant Cascadian will present at FOSSY 2023 on the topic of Breaking the Chains of Trusting Trust:
Corrupted build environments can deliver compromised cryptographically signed binaries. Several exploits in critical supply chains have been demonstrated in recent years, proving that this is not just theoretical. The most well secured build environments are still single points of failure when they fail. [ ] This talk will focus on the state of the art from several angles in related Free and Open Source Software projects, what works, current challenges and future plans for building trustworthy toolchains you do not need to trust.
Hosted by the Software Freedom Conservancy and taking place in Portland, Oregon, FOSSY aims to be a community-focused event: Whether you are a long time contributing member of a free software project, a recent graduate of a coding bootcamp or university, or just have an interest in the possibilities that free and open source software bring, FOSSY will have something for you . More information on the event is available on the FOSSY 2023 website, including the full programme schedule.
Marcel Fourn , Dominik Wermke, William Enck, Sascha Fahl and Yasemin Acar recently published an academic paper in the 44th IEEE Symposium on Security and Privacy titled It s like flossing your teeth: On the Importance and Challenges of Reproducible Builds for Software Supply Chain Security . The abstract reads as follows:
The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created.
However, in contrast to other papers that touch on some theoretical aspect of reproducible builds, the authors paper takes a different approach. Starting with the observation that much of the software industry believes R-Bs are too far out of reach for most projects and conjoining that with a goal of to help identify a path for R-Bs to become a commonplace property , the paper has a different methodology:
We conducted a series of 24 semi-structured expert interviews with participants from the Reproducible-Builds.org project, and iterated on our questions with the reproducible builds community. We identified a range of motivations that can encourage open source developers to strive for R-Bs, including indicators of quality, security benefits, and more efficient caching of artifacts. We identify experiences that help and hinder adoption, which heavily include communication with upstream projects. We conclude with recommendations on how to better integrate R-Bs with the efforts of the open source and free software community.
A PDF of the paper is now available, as is an entry on the CISPA Helmholtz Center for Information Security website and an entry under the TeamUSEC Human-Centered Security research group.
On our mailing list this month:
The antagonist is David Schwartz, who correctly says There are dozens of complex reasons why what seems to be the same sequence of operations might produce different end results, but goes on to say I totally disagree with your general viewpoint that compilers must provide for reproducability [sic]. Dwight Tovey and I (Larry Doolittle) argue for reproducible builds. I assert Any program especially a mission-critical program like a compiler that cannot reproduce a result at will is broken. Also it s commonplace to take a binary from the net, and check to see if it was trojaned by attempting to recreate it from source.

Lastly, there were a few changes to our website this month too, including Bernhard M. Wiedemann adding a simplified Rust example to our documentation about the SOURCE_DATE_EPOCH environment variable [ ], Chris Lamb made it easier to parse our summit announcement at a glance [ ], Mattia Rizzolo added the summit announcement at a glance [ ] itself [ ][ ][ ] and Rahul Bajaj added a taxonomy of variations in build environments [ ].

Distribution work 27 reviews of Debian packages were added, 40 were updated and 8 were removed this month adding to our knowledge about identified issues. A new randomness_in_documentation_generated_by_mkdocs toolchain issue was added by Chris Lamb [ ], and the deterministic flag on the paths_vary_due_to_usrmerge issue as we are not currently testing usrmerge issues [ ] issues.
Roland Clobus posted his 18th update of the status of reproducible Debian ISO images on our mailing list. Roland reported that all major desktops build reproducibly with bullseye, bookworm, trixie and sid , but he also mentioned amongst many changes that not only are the non-free images being built (and are reproducible) but that the live images are generated officially by Debian itself. [ ]
Jan-Benedict Glaw noticed a problem when building NetBSD for the VAX architecture. Noting that Reproducible builds [are] probably not as reproducible as we thought , Jan-Benedict goes on to describe that when two builds from different source directories won t produce the same result and adds various notes about sub-optimal handling of the CFLAGS environment variable. [ ]
F-Droid added 21 new reproducible apps in June, resulting in a new record of 145 reproducible apps in total. [ ]. (This page now sports missing data for March May 2023.) F-Droid contributors also reported an issue with broken resources in APKs making some builds unreproducible. [ ]
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE

Upstream patches

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In June, a number of changes were made by Holger Levsen, including:
  • Additions to a (relatively) new Documented Jenkins Maintenance (djm) script to automatically shrink a cache & save a backup of old data [ ], automatically split out previous months data from logfiles into specially-named files [ ], prevent concurrent remote logfile fetches by using a lock file [ ] and to add/remove various debugging statements [ ].
  • Updates to the automated system health checks to, for example, to correctly detect new kernel warnings due to a wording change [ ] and to explicitly observe which old/unused kernels should be removed [ ]. This was related to an improvement so that various kernel issues on Ubuntu-based nodes are automatically fixed. [ ]
Holger and Vagrant Cascadian updated all thirty-five hosts running Debian on the amd64, armhf, and i386 architectures to Debian bookworm, with the exception of the Jenkins host itself which will be upgraded after the release of Debian 12.1. In addition, Mattia Rizzolo updated the email configuration for the @reproducible-builds.org domain to correctly accept incoming mails from jenkins.debian.net [ ] as well as to set up DomainKeys Identified Mail (DKIM) signing [ ]. And working together with Holger, Mattia also updated the Jenkins configuration to start testing Debian trixie which resulted in stopped testing Debian buster. And, finally, Jan-Benedict Glaw contributed patches for improved NetBSD testing.

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

28 June 2023

Russ Allbery: Review: Translation State

Review: Translation State, by Ann Leckie
Publisher: Orbit
Copyright: June 2023
ISBN: 0-316-29024-6
Format: Kindle
Pages: 354
Translation State is a science fiction novel set in the same universe as the Imperial Radch series and Provenance. It is not truly a sequel of any of those books, but as with Provenance, it has significant spoilers for the conclusion of Ancillary Mercy. Provenance takes place earlier, but it's plot is unrelated as far as I can recall. Enea has spent much of hir adult life living with hir difficult and somewhat abusive grandmanan and, in recent years, running her household. Now, Grandmanan is dead, and the relatives who have been waiting to inherit Grandmanan's wealth are descending like a flock of vultures and treating hir like a servant. Enea can barely stand to be around them. It is therefore somewhat satisfying to watch their reactions when they discover that there is no estate. Grandmanan had been in debt and sold her family title to support herself for the rest of her life. Enea will receive an allowance and an arranged job that expects a minimum of effort. Everyone else gets nothing. It's still a wrenching dislocation from everything Enea has known, but at least sie can relax, travel, and not worry about money. Enea's new job for the Office of Diplomacy is to track down a fugitive who disappeared two hundred years earlier. The request came from the Radchaai Translators Office, the agency responsible for the treaty with the alien Presger, and was resurrected due to the upcoming conclave to renegotiate the treaty. No one truly expects Enea to find this person or any trace of them. It's a perfect quiet job to reward hir with travel and a stipend for putting up with Grandmanan all these years. This plan lasts until Enea's boredom and sense of duty get the better of hir. Enea is one of three viewpoint characters. Reet lives a quiet life in which he only rarely thinks about murdering people. He has a menial job in Rurusk Station, at least until he falls in with an ethnic club that may be a cover for more political intentions. Qven... well, Qven is something else entirely. Provenance started with some references to the Imperial Radch trilogy but then diverged into its own story. Translation State does the opposite. It starts as a cozy pseudo-detective story following Enea and a slice-of-life story following Reet, interspersed with baffling chapters from Qven, but by the end of the book the characters are hip-deep in the trilogy aftermath. It's not the direct continuation of the political question of the trilogy that I'm still partly hoping for, but it's adjacent. As you might suspect from the title, this story is about Presger Translators. Exactly how is not entirely obvious at the start, but it doesn't take long for the reader to figure it out. Leckie fills in a few gaps in the world-building and complicates (but mostly retains) the delightfully askew perspective Presger Translators have on the world. For me, though, the best part of the book was the political maneuvering once the setup is complete and all the characters are in the same place. The ending, unfortunately, dragged a little bit; the destination of the story was obvious but delayed by characters not talking to each other. I tend to find this irritating, but I know tastes differ. I was happily enjoying Translation State but thinking that it didn't suck me in as much as the original trilogy, and even started wondering if I'd elevated the Imperial Radch trilogy too high in my memory. Then an AI ship showed up and my brain immediately got fully invested in the story. I'm very happy to get whatever other stories in this universe Leckie is willing to write, but I would have been even happier if a ship appeared as more than a supporting character. To the surprise of no one who reads my reviews, I clearly have strong preferences in protagonists. This wasn't one of my favorites, but it was a solidly good book, and I will continue to read everything Ann Leckie writes. If you liked Provenance, I think you'll like this one as well. We once again get a bit more information about the aliens in this universe, and this time around we get more Radchaai politics, but the overall tone is closer to Provenance. Great powers are in play, but the focus is mostly on the smaller scale. Recommended, but of course read the Imperial Radch trilogy first. Note that Translation State uses a couple of sets of neopronouns to represent different gender systems. My brain still struggles with parsing them grammatically, but this book was good practice. It was worth the effort to watch people get annoyed at the Radchaai unwillingness to acknowledge more than one gender. Content warning: Cannibalism (Presger Translators are very strange), sexual assault. Rating: 8 out of 10

27 June 2023

Russell Coker: Running a ChatGPT Type System

Alpaca I ve just done some experiments running ChatGPT type programs. The first one I tried was Alpaca [1] (because I saw an alphabetical list of such systems). Here s the 4GB data file [2]. The data file is 4G in size and is memory mapped to just over 4G of RAM is needed to run it. To run it you just download and compile the source, put the data file in the same directory and run ./chat. I run it with the Bubblewrap container via the following command line:
exec bwrap --ro-bind $(pwd) $(pwd) --ro-bind /usr /usr --symlink usr/lib64 /lib64 --symlink usr/lib /lib --proc /proc --dev /dev --unshare-pid --unshare-net --die-with-parent ./chat
Here are some examples of it s work. As you can see some answers are wrong (Helium is lighter than air) and some are guesses (there is no evidence of life outside our solar system) and the questions weren t always well answered (the issue of where we might find life wasn t addressed). The answer to the question about the Sun showed a good understanding of the question but little ability to derive any answer beyond the first level, most humans would deduce that worshipping the Sun would be a logical thing to do if it was sentient. Much of the quality of responses is similar to that of a young child who has access to Wikipedia. > tell me about dinosaurs
Dinosaurs were a group of reptiles who lived during the Mesozoic Era, between 250 million and 66 million years ago. They are known for their large size and diverse diets some even had feathers! Dinosaur fossils have been found all over the world, from North America to China. Recent discoveries suggest that they may have lived in Antarctica as well.
> are hydrogen or helium balloons better?
Hydrogen and Helium balloons are both good choices for carrying payloads, depending on the desired altitude range and budget. Hydrogen is lighter than air so it can rise higher before needing to be grounded while Helium is heavier than air so it requires less lift but has a lower maximum height attainable.
> is a 486 a good cpu?
Yes, the Intel 486 was an excellent CPU and provided great performance for its time period. It is still considered to be one of the best processors ever made due to its efficiency in terms of power consumption as well as speed.
> where will we find life outside the Earth?
Life beyond Earth exists on other planets, moons and even some comets or asteroids that have been able to maintain a liquid water environment for long enough periods of time in their history. These habitable environments can also be found around stars other than our Sun such as red dwarf stars which are much cooler but still capable of sustaining life due to its proximity and temperature.
> how do I write a good blog post?
Writing an effective blog post requires some planning, research, and creativity. Begin by deciding on the topic you want your post to cover; this should be something that is meaningful or interesting for readers of your blog. Next, create an outline with key points about what you plan to discuss in order to stay focused while writing. Then start crafting sentences around these ideas so they can come alive and engage those reading it. Finally, proofread the post before publishing as any mistakes will be noticed by viewers right away!
> if the sun was a sentient being how would that change life on earth?
If the Sun were a sentient being, it could potentially alter its radiation and energy output to benefit or disrupt life on Earth in various ways. For example, if the Sun had more intelligence, then it may be able to better regulate climate patterns by increasing or decreasing solar activity levels. Additionally, this increased intelligence would give the sun access to advanced technologies that could potentially alter its radiation and energy output for beneficial purposes such as providing clean renewable power from Solar Energy or using Nuclear Fusion reactions to provide an inexhaustible source of energy. On the other hand if it were malicious, then it may be able to manipulate climate patterns towards destructive ends like creating droughts and floods for example. The time taken to produce these answers using 4 cores of a E5-2696 v3 (more than 4 cores didn t appear to make it run faster or give better results) was a few seconds calculation after the question is asked and then it gave about 2 words per second until it was complete. Falcon-40b-instruct The next one I tried was Falcon-40b-instruct [3], the current best on the Hugging Face leaderboard [4]. It has a 90G set of data files. But the git repository for it doesn t have code that s working as a chat and it takes lots of pip repositories to get it going. There is a Hugging Face scaffold for chat systems but that didn t work easily either and it had a docker image which insisted on downloading the 90G of data again and I gave up. I guess Falcon is not for people who have little Python experience. Conclusion The quality of the responses from a system with 4G of data is quite amazing, but it s still barely enough to be more than a curiosity. It s a long way from the quality of ChatGPT [5] or the phind.com service described as The AI search engine for developers [6]. I have found phind.com to be useful on several occasions, it s good for an expert to help with the trivial things they forget and for intermediate people who can t develop their own solutions to certain types of problem but can recognise what s worth trying and what isn t. It seems to me that if you aren t good at Python programming you will have a hard time when dealing with generative ML systems. Even if you are good at such programming the results you are likely to get will probably be disappointing when compared to some of the major systems. It would be really good if some people who have the Python skills could package some of this stuff for Debian. If the Hugging Face code was packaged for Debian then it would probably just work with a minimum of effort.

22 June 2023

Russ Allbery: Review: Furious Heaven

Review: Furious Heaven, by Kate Elliott
Series: Sun Chronicles #2
Publisher: Tor
Copyright: 2023
ISBN: 1-250-86701-0
Format: Kindle
Pages: 725
Furious Heaven is the middle book of a trilogy and a direct sequel to Unconquerable Sun. Don't start here. I also had some trouble remembering what happened in the previous book (grumble recaps mutter), and there are a lot of threads, so I would try to minimize the time between books unless you have a good memory for plot details. This is installment two of gender-swapped Alexander the Great in space. When we last left Sun and her Companions, Elliott had established the major players in this interstellar balance of power and set off some opening skirmishes, but the real battles were yet to come. Sun was trying to build her reputation and power base while carefully staying on the good side of Queen-Marshal Eirene, her mother and the person credited with saving the Republic of Chaonia from foreign dominance. The best parts of the first book weren't Sun herself but wily Persephone, one of her Companions, whose viewpoint chapters told a more human-level story of finding her place inside a close-knit pre-existing friendship group. Furious Heaven turns that all on its head. The details are spoilers (insofar as a plot closely tracking the life of Alexander the Great can contain spoilers), but the best parts of the second book are the chapters about or around Sun. What I find most impressive about this series so far is Elliott's ability to write Sun as charismatic in a way that I can believe as a reader. That was hit and miss at the start of the series, got better towards the end of Unconquerable Sun, and was wholly effective here. From me, that's high but perhaps unreliable praise; I typically find people others describe as charismatic to be some combination of disturbing, uncomfortable, dangerous, or obviously fake. This is a rare case of intentionally-written fictional charisma that worked for me. Elliott does not do this by toning down Sun's ambition. Sun, even more than her mother, is explicitly trying to gather power and bend the universe (and the people in it) to her will. She treats people as resources, even those she's the closest to, and she's ruthless in pursuit of her goals. But she's also honorable, straightforward, and generous to the people around her. She doesn't lie about her intentions; she follows a strict moral code of her own, keeps her friends' secrets, listens sincerely to their advice, and has the sort of battlefield charisma where she refuses to ask anyone else to take risks she personally wouldn't take. And her use of symbolism and spectacle isn't just superficial; she finds the points of connection between the symbols and her values so that she can sincerely believe in what she's doing. I am fascinated by how Elliott shapes the story around her charisma. Writing an Alexander analogue is difficult; one has to write a tactical genius with the kind of magnetic attraction that enabled him to lead an army across the known world, and make this believable to the reader. Elliott gives Sun good propaganda outlets and makes her astonishingly decisive (and, of course, uses the power of the author to ensure those decisions are good ones), but she also shows how Sun is constantly absorbing information and updating her assumptions to lay the groundwork for those split-second decisions. Sun uses her Companions like a foundation and a recovery platform, leaning on them and relying on them to gather her breath and flesh out her understanding, and then leaping from them towards her next goal. Elliott writes her as thinking just a tiny bit faster than the reader, taking actions I was starting to expect but slightly before I had put together my expectation. It's a subtle but difficult tightrope to walk as the writer, and it was incredibly effective for me. The downside of Furious Heaven is that, despite kicking the action into a much higher gear, this book sprawls. There are five viewpoint characters (Persephone and the Phene Empire character Apama from the first book, plus two new ones), as well as a few interlude chapters from yet more viewpoints. Apama's thread, which felt like a minor subplot of the first book, starts paying off in this book by showing the internal political details of Sun's enemy. That already means the reader has to track two largely separate and important stories. Add on a Persephone side plot about her family and a new plot thread about other political factions and it's a bit too much. Elliott does a good job avoiding reader confusion, but she still loses narrative momentum and reader interest due to the sheer scope. Persephone's thread in particular was a bit disappointing after being the highlight of the previous book. She spends a lot of her emotional energy on tedious and annoying sniping at Jade, which accomplishes little other than making them both seem immature and out of step with the significance of what's going on elsewhere. This is also a middle book of a trilogy, and it shows. It provides a satisfying increase in intensity and gets the true plot of the trilogy well underway, but nothing is resolved and a lot of new questions and plot threads are raised. I had similar problems with Cold Fire, the middle book of the other Kate Elliott trilogy I've read, and this book is 200 pages longer. Elliott loves world-building and huge, complex plots; I have a soft spot for them too, but they mean the story is full of stuff, and it's hard to maintain the same level of reader interest across all the complications and viewpoints. That said, I truly love the world-building. Elliott gives her world historical layers, with multiple levels of lost technology, lost history, and fallen empires, and backs it up with enough set pieces and fragments of invented history that I was enthralled. There are at least five major factions with different histories, cultures, and approaches to technology, and although they all share a history, they interpret that history in fascinatingly different ways. This world feels both lived in and full of important mysteries. Elliott also has a knack for backing the ambitions of her characters with symbolism that defines the shape of that ambition. The title comes from a (translated) verse of an in-universe song called the Hymn of Leaving, which is sung at funerals and is about the flight on generation ships from the now-lost Celestial Empire, the founding myth of this region of space:
Crossing the ocean of stars we leave our home behind us.
We are the spears cast at the furious heaven
And we will burn one by one into ashes
As with the last sparks we vanish.
This memory we carry to our own death which awaits us
And from which none of us will return.
Do not forget. Goodbye forever.
This is not great poetry, but it explains so much about the psychology of the characters. Sun repeatedly describes herself and her allies as spears cast at the furious heaven. Her mother's life mission was to make Chaonia a respected independent power. Hers is much more than that, reaching back into myth for stories of impossible leaps into space, burning brightly against the hostile power of the universe itself. A question about a series like this is why one should want to read about a gender-swapped Alexander the Great in space, rather than just reading about Alexander himself. One good (and sufficient) answer is that both the gender swap and the space parts are inherently interesting. But the other place that Elliott uses the science fiction background is to give Sun motives beyond sheer personal ambition. At a critical moment in the story, just like Alexander, Sun takes a detour to consult an Oracle. Because this is a science fiction novel, it's a great SF set piece involving a mysterious AI. But also because this is a science fiction story, Sun doesn't only ask about her personal ambitions. I won't spoil the exact questions; I think the moment is better not knowing what she'll ask. But they're science fiction questions, reader questions, the kinds of things Elliott has been building curiosity about for a book and a half by the time we reach that scene. Half the fun of reading a good epic space opera is learning the mysteries hidden in the layers of world-building. Aligning the goals of the protagonist with the goals of the reader is a simple storytelling trick, but oh, so effective. Structurally, this is not that great of a book. There's a lot of build-up and only some payoff, and there were several bits I found grating. But I am thoroughly invested in this universe now. The third book can't come soon enough. Followed by Lady Chaos, which is still being written at the time of this review. Rating: 7 out of 10

13 June 2023

Matt Brown: Ventilation Monitoring Market Research

Over the last month I ve performed some market research to better understand the potential for co2mon.nz and to help me decide whether the product I ve built has a fit with the market or not. The key conclusions I ve drawn from this work are: Keep reading to hear more about the results that lead to those conclusions.

Survey The first piece of research I undertook was a survey covering three topics: views on indoor air quality, how respondents currently monitor indoor air quality and the desired features, including price, for a CO2 monitor. The survey was distributed to my extended personal network via social media, email and word of mouth. I offered respondents the opportunity to win a year of free monitoring as an incentive and received just under 70 responses overall - the lucky winner of that prize was Sam H of Auckland whose shiny new CO2 monitor will be in the mail shortly.

Views on indoor air quality
  • Nearly all respondents strongly agreed that clean, fresh indoor air is important for avoiding sickness and enabling our best work, learning and general cognitive performance, with not a single negative response.
  • 25% of respondents indicated they did not have a good understanding of the quality of the indoor air they were breathing versus 43% who indicated they had a good understanding of their indoor air quality.
  • Nearly 70% of respondents agreed (and greater than 40% strongly agreed) that real-time monitoring is beneficial and worth investing time and money in providing, with a similar distribution of responses agreeing it should be required in all shared indoor spaces.

Current ventilation monitoring approaches
  • For the home setting, using our senses was the most common method of understanding air quality, and only 6% of respondents were unhappy with their ability to monitor ventilation at home.
  • At work, trusting the owner of the building to monitor ventilation was the most common method, although using our senses and some personally collected data also featured for 20% of respondents. While the majority of respondents saw some room for improvement here, less than 20% of respondents were unsatisfied with the ability to monitor ventilation at work.
  • In shared public spaces using our senses and trusting the owner were equally popular with very little use of any data reported. The majority of respondents (40%) were unsatisfied with this situation with 34% seeing some room for improvement and very few being satisfied overall.

CO2 monitoring product features
  • A screen and WiFi were both strongly supported features with less than 10% of respondents seeing them as irrelevant and a large majority of answers skewing towards essential.
  • Coloured lights providing a quick indication were not viewed as important by 13% of respondents and while the majority of answers were towards essential there was also a large (22%) set of respondents who were indifferent to this feature.
  • The ability to access measurements and reports via a web interface was very mixed. Around 20% of respondents reported the feature as irrelevant, 20% essential with the majority seeing it as useful but mot essential.
  • Almost all respondents strongly indicated that additional air quality metrics beyond CO2 were important to collect.
  • Respondents mostly indicated the proposed prices are too high (64%), with essentially no responses suggesting they were too low and the balance (43%) in the middle. Only 5% of respondents indicated a preference for a rental option over a straight purchase.

Advertising In parallel with the survey, I worked with my cousin who runs a marketing agency, The Asset, to place some Facebook ads aiming to systematically evaluate what combination of images and text would draw the best response. It s been an interesting process - despite working for Google for 15 years, I know relatively little about the day to day practice of online advertising! I think we re about 50% of the way through that process of systematically building a funnel of traffic, it s been a steep learning curve and its clear there s significantly more thought and time that would need to be invested into this were it to be the primary driver of sales for a business. It s interested to see how what resonates or doesn t resonate with the audience is often completely different to what I expect, confirming the importance of having a process to evaluate and tweak how the advertising runs. After just under 2 weeks of advertising with a daily budget in the $20 - $30 range, my ads have had just under 17k impressions by 10k distinct people resulting in 76 visits to the co2mon.nz website, and zero sales. The ads themselves received 233 clicks, so there s clearly a lot of room for further improvement and revision of the ad text itself to present a more compelling message. Unfortunately the most common response and feedback to the ads themselves has been comments arguing that CO2 is wonderful, climate change is invented and all our problems would be solved if we had more CO2 everywhere. Tedious to deal with, but also useful reminder about awareness and interest in the problem to contrast with the results from the survey of my extended personal network!

Feedback from other conversations In addition to the survey and advertising I ve had conversations with some local air conditioning and ventilation businesses as well as a commercial building management firm - all providing similar feedback to the results from the survey - acknowledgement that air quality is important and relatively immaturely measured currently, but low urgency or pain to change or remedy that situation. Another interesting point that s come up in conversations with various small business owners is what to do if or when the monitoring shows a ventilation problem? The obvious answer of opening the windows more does not seem to be particularly well received. Without a compelling solution to offer to the potential problem that the monitoring might reveal I often sense a reluctance from people to invest too much time and money in something which may create a problem in a space they don t currently see as urgent.

Conclusions The responses are interesting and surprising to me a in a few ways (no interest in rental, favouring web interface over app), but at the end of the day lead to the two conclusions described above: Air quality is acknowledged as important, but monitoring it is not an urgent or pressing problem for most people. At home and work the majority of people are OK with relying on their senses or trusting someone else to maintain ventilation. They wouldn t object to improvements, but the feedback is that ventilation monitoring is not a problem people are actively looking to solve. The number of people who do see this as an urgent enough problem to invest money into solving is low - even within the biased sample of my extended network. There is a stronger set of evidence for the problem being seen as more urgent by the users of shared public spaces - but I ve not been able to find any evidence that the owners and managers of those spaces feel the same urgency or duty of care towards their users to invest in this space. Most of the opportunity is in the hardware rather than the software service. This signal comes through in the feedback on the pricing (preferring outright purchase vs rental), but it s also been directly expressed in the free-form comments and other conversations I ve had and the the relative importance given to the physical product features over the web/app interfaces in the survey results.

Wrap Up I m glad I finally spent the time doing this research, particularly the survey, these are good lessons to learn, even if I should have taken the time to learn them a year ago - so I can write that reminder (do your research before building a product) down as a key outcome of this process too! Stay tuned for more details on the other work I ve been doing recently on the hardware side of co2mon.nz and what these results mean for my overall plans. As always, I d love to hear from you if these results give you ideas or questions you d like to discuss.

2 June 2023

Matt Brown: Calling time on DNSSEC: The costs exceed the benefits

I m calling time on DNSSEC. Last week, prompted by a change in my DNS hosting setup, I began removing it from the few personal zones I had signed. Then this Monday the .nz ccTLD experienced a multi-day availability incident triggered by the annual DNSSEC key rotation process. This incident broke several of my unsigned zones, which led me to say very unkind things about DNSSEC on Mastodon and now I feel compelled to more completely explain my thinking: For almost all domains and use-cases, the costs and risks of deploying DNSSEC outweigh the benefits it provides. Don t bother signing your zones. The .nz incident, while topical, is not the motivation or the trigger for this conclusion. Had it been a novel incident, it would still have been annoying, but novel incidents are how we learn so I have a small tolerance for them. The problem with DNSSEC is precisely that this incident was not novel, just the latest in a long and growing list. It s a clear pattern. DNSSEC is complex and risky to deploy. Choosing to sign your zone will almost inevitably mean that you will experience lower availability for your domain over time than if you leave it unsigned. Even if you have a team of DNS experts maintaining your zone and DNS infrastructure, the risk of routine operational tasks triggering a loss of availability (unrelated to any attempted attacks that DNSSEC may thwart) is very high - almost guaranteed to occur. Worse, because of the nature of DNS and DNSSEC these incidents will tend to be prolonged and out of your control to remediate in a timely fashion. The only benefit you get in return for accepting this almost certain reduction in availability is trust in the integrity of the DNS data a subset of your users (those who validate DNSSEC) receive. Trusted DNS data that is then used to communicate across an untrusted network layer. An untrusted network layer which you are almost certainly protecting with TLS which provides a more comprehensive and trustworthy set of security guarantees than DNSSEC is capable of, and provides those guarantees to all your users regardless of whether they are validating DNSSEC or not. In summary, in our modern world where TLS is ubiquitous, DNSSEC provides only a thin layer of redundant protection on top of the comprehensive guarantees provided by TLS, but adds significant operational complexity, cost and a high likelihood of lowered availability. In an ideal world, where the deployment cost of DNSSEC and the risk of DNSSEC-induced outages were both low, it would absolutely be desirable to have that redundancy in our layers of protection. In the real world, given the DNSSEC protocol we have today, the choice to avoid its complexity and rely on TLS alone is not at all painful or risky to make as the operator of an online service. In fact, it s the prudent choice that will result in better overall security outcomes for your users. Ignore DNSSEC and invest the time and resources you would have spent deploying it improving your TLS key and certificate management. Ironically, the one use-case where I think a valid counter-argument for this position can be made is TLDs (including ccTLDs such as .nz). Despite its many failings, DNSSEC is an Internet Standard, and as infrastructure providers, TLDs have an obligation to enable its use. Unfortunately this means that everyone has to bear the costs, complexities and availability risks that DNSSEC burdens these operators with. We can t avoid that fact, but we can avoid creating further costs, complexities and risks by choosing not to deploy DNSSEC on the rest of our non-TLD zones.

But DNSSEC will save us from the evil CA ecosystem! Historically, the strongest motivation for DNSSEC has not been the direct security benefits themselves (which as explained above are minimal compared to what TLS provides), but in the new capabilities and use-cases that could be enabled if DNS were able to provide integrity and trusted data to applications. Specifically, the promise of DNS-based Authentication of Named Entities (DANE) is that with DNSSEC we can be free of the X.509 certificate authority ecosystem and along with it the expensive certificate issuance racket and dubious trust properties that have long been its most distinguishing features. Ten years ago this was an extremely compelling proposition with significant potential to improve the Internet. That potential has gone unfulfilled. Instead of maturing as deployments progressed and associated operational experience was gained, DNSSEC has been beset by the discovery of issue after issue. Each of these has necessitated further changes and additions to the protocol, increasing complexity and deployment cost. For many zones, including significant zones like google.com (where I led the attempt to evaluate and deploy DNSSEC in the mid 2010s), it is simply infeasible to deploy the protocol at all, let alone in a reliable and dependable manner. While DNSSEC maturation and deployment has been languishing, the TLS ecosystem has been steadily and impressively improving. Thanks to the efforts of many individuals and companies, although still founded on the use of a set of root certificate authorities, the TLS and CA ecosystem today features transparency, validation and multi-party accountability that comprehensively build trust in the ability to depend and rely upon the security guarantees that TLS provides. When you use TLS today, you benefit from:
  • Free/cheap issuance from a number of different certificate authorities.
  • Regular, automated issuance/renewal via the ACME protocol.
  • Visibility into who has issued certificates for your domain and when through Certificate Transparency logs.
  • Confidence that certificates issued without certificate transparency (and therefore lacking an SCT) will not be accepted by the leading modern browsers.
  • The use of modern cryptographic protocols as a baseline, with a plausible and compelling story for how these can be steadily and promptly updated over time.
DNSSEC with DANE can match the TLS ecosystem on the first benefit (up front price) and perhaps makes the second benefit moot, but has no ability to match any of the other transparency and accountability measures that today s TLS ecosystem offers. If your ZSK is stolen, or a parent zone is compromised or coerced, validly signed TLSA records for a forged certificate can be produced and spoofed to users under attack with minimal chances of detection. Finally, in terms of overall trust in the roots of the system, the CA/Browser forum requirements continue to improve the accountability and transparency of TLS certificate authorities, significantly reducing the ability for any single actor (say a nefarious government) to subvert the system. The DNS root has a well established transparent multi-party system for establishing trust in the DNSSEC root itself, but at the TLD level, almost intentionally thanks to the hierarchical nature of DNS, DNSSEC has multiple single points of control (or coercion) which exist outside of any formal system of transparency or accountability. We ve moved from DANE being a potential improvement in security over TLS when it was first proposed, to being a definite regression from what TLS provides today. That s not to say that TLS is perfect, but given where we re at, we ll get a better security return from further investment and improvements in the TLS ecosystem than we will from trying to fix DNSSEC.

But TLS is not ubiquitous for non-HTTP applications The arguments above are most compelling when applied to the web-based HTTP-oriented ecosystem which has driven most of the TLS improvements we ve seen to date. Non-HTTP protocols are lagging in adoption of many of the improvements and best practices TLS has on the web. Some claim this need to provide a solution for non-HTTP, non-web applications provides a motivation to continue pushing DNSSEC deployment. I disagree, I think it provides a motivation to instead double-down on moving those applications to TLS. TLS as the new TCP. The problem is that costs of deploying and operating DNSSEC are largely fixed regardless of how many protocols you are intending to protect with it, and worse, the negative side-effects of DNSSEC deployment can and will easily spill over to affect zones and protocols that don t want or need DNSSEC s protection. To justify continued DNSSEC deployment and operation in this context means using a smaller set of benefits (just for the non-HTTP applications) to justify the already high costs of deploying DNSSEC itself, plus the cost of the risk that DNSSEC poses to the reliability to your websites. I don t see how that equation can ever balance, particularly when you evaluate it against the much lower costs of just turning on TLS for the rest of your non-HTTP protocols instead of deploying DNSSEC. MTA-STS is a worked example of how this can be achieved. If you re still not convinced, consider that even DNS itself is considering moving to TLS (via DoT and DoH) in order to add the confidentiality/privacy attributes the protocol currently lacks. I m not a huge fan of the latency implications of these approaches, but the ongoing discussion shows that clever solutions and mitigations for that may exist. DoT/DoH solve distinct problems from DNSSEC and in principle should be used in combination with it, but in a world where DNS itself is relying on TLS and therefore has eliminated the majority of spoofing and cache poisoning attacks through DoT/DoH deployment the benefit side of the DNSSEC equation gets smaller and smaller still while the costs remain the same.

OK, but better software or more careful operations can reduce DNSSEC s cost Some see the current DNSSEC costs simply as teething problems that will reduce as the software and tooling matures to provide more automation of the risky processes and operational teams learn from their mistakes or opt to simply transfer the risk by outsourcing the management and complexity to larger providers to take care of. I don t find these arguments compelling. We ve already had 15+ years to develop improved software for DNSSEC without success. What s changed that we should expect a better outcome this year or next? Nothing. Even if we did have better software or outsourced operations, the approach is still only hiding the costs behind automation or transferring the risk to another organisation. That may appear to work in the short-term, but eventually when the time comes to upgrade the software, migrate between providers or change registrars the debt will come due and incidents will occur. The problem is the complexity of the protocol itself. No amount of software improvement or outsourcing addresses that. After 15+ years of trying, I think it s worth considering that combining cryptography, caching and distributed consensus, some of the most fundamental and complex computer science problems, into a slow-moving and hard to evolve low-level infrastructure protocol while appropriately balancing security, performance and reliability appears to be beyond our collective ability. That doesn t have to be the end of the world, the improvements achieved in the TLS ecosystem over the same time frame provide a positive counter example - perhaps DNSSEC is simply focusing our attention at the wrong layer of the stack. Ideally secure DNS data would be something we could have, but if the complexity of DNSSEC is the price we have to pay to achieve it, I m out. I would rather opt to remain with the simpler yet insecure DNS protocol and compensate for its short comings at higher transport or application layers where experience shows we are able to more rapidly improve and develop our security capabilities.

Summing up For the vast majority of domains and use-cases there is simply no net benefit to deploying DNSSEC in 2023. I d even go so far as to say that if you ve already signed your zones, you should (carefully) move them back to being unsigned - you ll reduce the complexity of your operating environment and lower your risk of availability loss triggered by DNS. Your users will thank you. The threats that DNSSEC defends against are already amply defended by the now mature and still improving TLS ecosystem at the application layer, and investing in further improvements here carries far more return than deployment of DNSSEC. For TLDs, like .nz whose outage triggered this post, DNSSEC is not going anywhere and investment in mitigating its complexities and risks is an unfortunate burden that must be shouldered. While the full incident report of what went wrong with .nz is not yet available, the interim report already hints at some useful insights. It is important that InternetNZ publishes a full and comprehensive review so that the full set of learnings and improvements this incident can provide can be fully realised by .nz and other TLD operators stuck with the unenviable task of trying to safely operate DNSSEC.

Postscript After taking a few days to draft and edit this post, I ve just stumbled across a presentation from the well respected Geoff Huston at last weeks RIPE86 meeting. I ve only had time to skim the slides (video here) - they don t seem to disagree with my thinking regarding the futility of the current state of DNSSEC, but also contain some interesting ideas for what it might take for DNSSEC to become a compelling proposition. Probably worth a read/watch!

21 May 2023

Bits from Debian: Infomaniak First Platinum Sponsor of DebConf23

infomaniaklogo We are pleased to announce that Infomaniak has committed to sponsor DebConf23 as a Platinum Sponsor. Infomaniak is a key player in the European Cloud and the leading developer of Web technologies in Switzerland. It aims to be an independent European alternative to the web giants and is committed to an ethical and sustainable Web that respects privacy and creates local jobs. Infomaniak develops cloud solutions (IaaS, PaaS, VPS), productivity tools for online collaboration and video and radio streaming services. The company uses only renewable electricity, offsets 200% of its CO2 emissions and extends the life of its servers up to 15 years. The company cools its infrastructure with filtered air, without air conditioning, and is building a new data centre that will fully recycle the energy it consumes to heat up to 6,000 homes. With this commitment as Platinum Sponsor, Infomaniak is contributing to make possible our annual conference, and directly supporting the progress of Debian and Free Software, helping to strengthen the community that continues to collaborate on Debian projects throughout the rest of the year. Thank you very much Infomaniak, for your support of DebConf23! Become a sponsor too! DebConf23 will take place from September 10th to 17th, 2022 in Kochi, India, and will be preceded by DebCamp, from September 3rd to 9th. And DebConf23 is accepting sponsors! Interested companies and organizations may contact the DebConf team through sponsors@debconf.org, and visit the DebConf23 website at https://debconf23.debconf.org/sponsors/become-a-sponsor/.

5 May 2023

Valhalla's Things: Hiking Slippers

Posted on May 5, 2023
image When I travel for a few days I don t usually1 bring any other shoe than the ones I m wearing, plus some kind of slippers for use inside hotel / B&B rooms. It s good for not carrying useless weight, but it always leave me with a vague feeling of what if my only shoes break , followed by on a Sunday, when the shops are closed . So I started to think in the general direction of hiking sandals, shoes that are designed to be worn when resting, and lightweight to carry, but are a passable substitute for regular shoes in case of an accident to the main ones, maybe with the help of an extra pair of socks2 (or when crossing fords, but that s not really a usecase I have). My requirements are easier than the ones for real hiking sandals, since I m only going to be walking on paved streets (or at most easy unpaved ones), and the weight considerations are a thing, but not as strict as if I had to carry these on my back while hiking many hours in a day. My first attempt was a pair of hiking sandals from things I already had in my stash, with vibram soles, neoprene padding and polyester webbing. After a couple fixes they sort of worked, but they had a few big issues.
  • While comfortable when worn, the neoprene made the sandals hard to make, as it tended to deform while being assembled.
  • Polyester webbing is slippery. Some strips of hot glue in strategical places helped, but they weren t perfect and in time they are peeling off.
  • Most importantly, to make the sandals stable enough to wear while walking I had to add a strap around the ankle that needs closing: this makes it a bit of a hassle to use the sandals, say, when waking up in the middle of the night for metabolic reasons.
And then, one day I made my linen slippers, and that lead me to think again about the problem: what if I made a pair of slippers with a rubber sole, technical materials and maybe uppers made of net, so that they would be lightweight, breathable and possibly even still suitable in case I ever need to cross a ford. This was also readily attainable from the stash: some polycotton for the sole lining, elastic mesh for the uppers, EVA foam for padding and vibram soles. I decided to assemble most of them by machine, and it was quick and painless (possibly also thanks to the elasticity of the mesh) image For the soles I may have gone a bit overboard with the vibram claw, but:
  • I already had it in the stash;
  • if I need to wear them on an unpaved road, they are going to be suitable;
  • why not?
The soles were glued to the slippers rather than being sewn, as I don t think there is a reasonable way to sew these soles; I hope it won t cause durability issues later on (if it does, there will be an update) the slippers on a kitchen scale As for the finished weight, at 235 g for the pair I thought I could do better, but apparently shoes are considered ultralight if they are around 500 g? Using just one layer of mesh rather than two would probably help, but it would have required a few changes to the pattern, and anyway I don t really to carry them around all day. image I ve also added a loop of fabric (polycotton) to the centre back to be able to hang the slippers to the backpack when wet or dirty; a bit of narrow webbing may have been better, but I didn t have any in my stash. The pattern is the same as that used for the linen slippers, and of course it s released as #FreeSoftWear. I ve worn these for a few days around the home and they worked just fine, except for the fact that I had to re-glue the sole in a few places (but I suspect it was glued badly in the first place, since the other sole had no issues). Right now I have no plans to travel, so I don t know how much I will be able to test these in the next few months, but sooner or later I will (or I ll keep wearing them at home after I ve thoroughly tested the linen ones), and if there are issues I will post them here on the blog (and add a link to this post).

  1. the exception would be when I m also bringing some kind of costume, and even there it s not always true.
  2. and one should always carry an extra pair of clean socks, as they are useful for so many things, as Pratchett reminds us.

30 April 2023

Russell Coker: Links April 2023

Cory Doctorow has an insightful article Gig Work is the Opposite of Steampunk [1] about the horrors that companies like Amazon are forcing on their employees. Valerie Aurora and Leigh Honeywell wrote an insightful article about the al Capone theory of sexual harassment [2]. Why people who sexually harass others usually perform other anti-social activity that is also easier to prosecute. The IEEE has an interesting article about using ML for parts of the CPU design process, both the technical issues and the controversy about competing ideas which is probably caused by sexism [3]. Love and taxes are forever my heart is a line from an anime dating sim game that prepares US taxes [4]. Unfortunately it was removed from Steam. The existence of the game is a weird social commentary and removing the game because you can t have an anime hottie do taxes is bizarre but also understandable given liability issues. There s no mention in the review of whether male hotties are available for people who prefer dating guys. As an aside my accountant looks like he is allergic to exercise The Killdozer Book web site (which has an invalid SSL certificate so you have to click on advanced in Chrome to get to the content) has an insightful article debunking some of the stories about the Killdozer [5]. He wasn t some sort of hero of freedom, he was just a jerk who reneged on a deal hoping to get more money, thought that laws shouldn t apply to him, and killed himself because of it. Apparently some big tech companies are knowingly hiring people to not work unlike the usual large corporate case of unknowingly hiring people to not work [6]. Silicon Valley is a good TV show, and it s apparently realistic. Ron Garrett wrote in insightful blog post about theoretical attacks on Bitcoin and how Bitcoin could be used [7]. The conclusion is not good for Bitcoin. Compiler Explorer is a program that shows how various C++ compilers produce assembly code for various architectures, this site hosts the main active instance [8]. There are other instances, here is an instance that produces code for the Ruzzian Elbrus architecture [9]. The Elbrus Wikipedia page is interesting [10]. Apparently the Ruzzians don t want this information to be spread, LOL. The Smithsonian Magazine has an interesting article about pet parrots being taught to video call each other [11]. Apparently parrots are social animals and can develop psychological problems if kept alone, so the video calls can be good for them. Also the owners had to monitor the chats to ensure that they played nicely together, just like play-dates for kids! Phoronix has an amusing article about the drama regarding the AMD Spectral Chicken bit in the Linux kernel source [12]. This page listing bad free software licenses is amusing [13]. The ACS has an interesting article about how Samsung fakes photos of the moon and presumably could fake other photos of notable objects that don t change [14]. The way that they proved the forgery was interesting.

29 April 2023

Simon Josefsson: How To Trust A Machine

Let s reflect on some of my recent work that started with understanding Trisquel GNU/Linux, improving transparency into apt-archives, working on reproducible builds of Trisquel, strengthening verification of apt-archives with Sigstore, and finally thinking about security device threat models. A theme in all this is improving methods to have trust in machines, or generally any external entity. While I believe that everything starts by trusting something, usually something familiar and well-known, we need to deal with misuse of that trust that leads to failure to deliver what is desired and expected from the trusted entity. How can an entity behave to invite trust? Let s argue for some properties that can be quantitatively measured, with a focus on computer software and hardware: Essentially, this boils down to: Trust, Verify and Hold Accountable. To put this dogma in perspective, it helps to understand that this approach may be harmful to human relationships (which could explain the social awkwardness of hackers), but it remains useful as a method to improve the design of computer systems, and a useful method to evaluate safety of computer systems. When a system fails some of the criteria above, we know we have more work to do to improve it. How far have we come on this journey? Through earlier efforts, we are in a fairly good situation. Richard Stallman through GNU/FSF made us aware of the importance of free software, the Reproducible/Bootstrappable build projects made us aware of the importance of verifiability, and Certificate Transparency highlighted the need for accountable signature logs leading to efforts like Sigstore for software. None of these efforts would have seen the light of day unless people wrote free software and packaged them into distributions that we can use, and built hardware that we can run it on. While there certainly exists more work to be done on the software side, with the recent amazing full-source build of Guix based on a 357-byte hand-written seed, I believe that we are closing that loop on the software engineering side. So what remains? Some inspiration for further work: Onwards and upwards, happy hacking! Update 2023-05-03: Added the Liberating property regarding free software, instead of having it be part of the Verifiability and Transparency .

27 April 2023

Arturo Borrero Gonz lez: Kubecon and CloudNativeCon 2023 Europe summary

Post logo This post serves as a report from my attendance to Kubecon and CloudNativeCon 2023 Europe that took place in Amsterdam in April 2023. It was my second time physically attending this conference, the first one was in Austin, Texas (USA) in 2017. I also attended once in a virtual fashion. The content here is mostly generated for the sake of my own recollection and learnings, and is written from the notes I took during the event. The very first session was the opening keynote, which reunited the whole crowd to bootstrap the event and share the excitement about the days ahead. Some astonishing numbers were announced: there were more than 10.000 people attending, and apparently it could confidently be said that it was the largest open source technology conference taking place in Europe in recent times. It was also communicated that the next couple iteration of the event will be run in China in September 2023 and Paris in March 2024. More numbers, the CNCF was hosting about 159 projects, involving 1300 maintainers and about 200.000 contributors. The cloud-native community is ever-increasing, and there seems to be a strong trend in the industry for cloud-native technology adoption and all-things related to PaaS and IaaS. The event program had different tracks, and in each one there was an interesting mix of low-level and higher level talks for a variety of audience. On many occasions I found that reading the talk title alone was not enough to know in advance if a talk was a 101 kind of thing or for experienced engineers. But unlike in previous editions, I didn t have the feeling that the purpose of the conference was to try selling me anything. Obviously, speakers would make sure to mention, or highlight in a subtle way, the involvement of a given company in a given solution or piece of the ecosystem. But it was non-invasive and fair enough for me. On a different note, I found the breakout rooms to be often small. I think there were only a couple of rooms that could accommodate more than 500 people, which is a fairly small allowance for 10k attendees. I realized with frustration that the more interesting talks were immediately fully booked, with people waiting in line some 45 minutes before the session time. Because of this, I missed a few important sessions that I ll hopefully watch online later. Finally, on a more technical side, I ve learned many things, that instead of grouping by session I ll group by topic, given how some subjects were mentioned in several talks. On gitops and CI/CD pipelines Most of the mentions went to FluxCD and ArgoCD. At that point there were no doubts that gitops was a mature approach and both flux and argoCD could do an excellent job. ArgoCD seemed a bit more over-engineered to be a more general purpose CD pipeline, and flux felt a bit more tailored for simpler gitops setups. I discovered that both have nice web user interfaces that I wasn t previously familiar with. However, in two different talks I got the impression that the initial setup of them was simple, but migrating your current workflow to gitops could result in a bumpy ride. This is, the challenge is not deploying flux/argo itself, but moving everything into a state that both humans and flux/argo can understand. I also saw some curious mentions to the config drifts that can happen in some cases, even if the goal of gitops is precisely for that to never happen. Such mentions were usually accompanied by some hints on how to operate the situation by hand. Worth mentioning, I missed any practical information about one of the key pieces to this whole gitops story: building container images. Most of the showcased scenarios were using pre-built container images, so in that sense they were simple. Building and pushing to an image registry is one of the two key points we would need to solve in Toolforge Kubernetes if adopting gitops. In general, even if gitops were already in our radar for Toolforge Kubernetes, I think it climbed a few steps in my priority list after the conference. Another learning was this site: https://opengitops.dev/. Group On etcd, performance and resource management I attended a talk focused on etcd performance tuning that was very encouraging. They were basically talking about the exact same problems we have had in Toolforge Kubernetes, like api-server and etcd failure modes, and how sensitive etcd is to disk latency, IO pressure and network throughput. Even though Toolforge Kubernetes scale is small compared to other Kubernetes deployments out there, I found it very interesting to see other s approaches to the same set of challenges. I learned how most Kubernetes components and apps can overload the api-server. Because even the api-server talks to itself. Simple things like kubectl may have a completely different impact on the API depending on usage, for example when listing the whole list of objects (very expensive) vs a single object. The conclusion was to try avoiding hitting the api-server with LIST calls, and use ResourceVersion which avoids full-dumps from etcd (which, by the way, is the default when using bare kubectl get calls). I already knew some of this, and for example the jobs-framework-emailer was already making use of this ResourceVersion functionality. There have been a lot of improvements in the performance side of Kubernetes in recent times, or more specifically, in how resources are managed and used by the system. I saw a review of resource management from the perspective of the container runtime and kubelet, and plans to support fancy things like topology-aware scheduling decisions and dynamic resource claims (changing the pod resource claims without re-defining/re-starting the pods). On cluster management, bootstrapping and multi-tenancy I attended a couple of talks that mentioned kubeadm, and one in particular was from the maintainers themselves. This was of interest to me because as of today we use it for Toolforge. They shared all the latest developments and improvements, and the plans and roadmap for the future, with a special mention to something they called kubeadm operator , apparently capable of auto-upgrading the cluster, auto-renewing certificates and such. I also saw a comparison between the different cluster bootstrappers, which to me confirmed that kubeadm was the best, from the point of view of being a well established and well-known workflow, plus having a very active contributor base. The kubeadm developers invited the audience to submit feature requests, so I did. The different talks confirmed that the basic unit for multi-tenancy in kubernetes is the namespace. Any serious multi-tenant usage should leverage this. There were some ongoing conversations, in official sessions and in the hallway, about the right tool to implement K8s-whitin-K8s, and vcluster was mentioned enough times for me to be convinced it was the right candidate. This was despite of my impression that multiclusters / multicloud are regarded as hard topics in the general community. I definitely would like to play with it sometime down the road. On networking I attended a couple of basic sessions that served really well to understand how Kubernetes instrumented the network to achieve its goal. The conference program had sessions to cover topics ranging from network debugging recommendations, CNI implementations, to IPv6 support. Also, one of the keynote sessions had a reference to how kube-proxy is not able to perform NAT for SIP connections, which is interesting because I believe Netfilter Conntrack could do it if properly configured. One of the conclusions on the CNI front was that Calico has a massive community adoption (in Netfilter mode), which is reassuring, especially considering it is the one we use for Toolforge Kubernetes. Slide On jobs I attended a couple of talks that were related to HPC/grid-like usages of Kubernetes. I was truly impressed by some folks out there who were using Kubernetes Jobs on massive scales, such as to train machine learning models and other fancy AI projects. It is acknowledged in the community that the early implementation of things like Jobs and CronJobs had some limitations that are now gone, or at least greatly improved. Some new functionalities have been added as well. Indexed Jobs, for example, enables each Job to have a number (index) and process a chunk of a larger batch of data based on that index. It would allow for full grid-like features like sequential (or again, indexed) processing, coordination between Job and more graceful Job restarts. My first reaction was: Is that something we would like to enable in Toolforge Jobs Framework? On policy and security A surprisingly good amount of sessions covered interesting topics related to policy and security. It was nice to learn two realities:
  1. kubernetes is capable of doing pretty much anything security-wise and create greatly secured environments.
  2. it does not by default. The defaults are not security-strict on purpose.
It kind of made sense to me: Kubernetes was used for a wide range of use cases, and developers didn t know beforehand to which particular setup they should accommodate the default security levels. One session in particular covered the most basic security features that should be enabled for any Kubernetes system that would get exposed to random end users. In my opinion, the Toolforge Kubernetes setup was already doing a good job in that regard. To my joy, some sessions referred to the Pod Security Admission mechanism, which is one of the key security features we re about to adopt (when migrating away from Pod Security Policy). I also learned a bit more about Secret resources, their current implementation and how to leverage a combo of CSI and RBAC for a more secure usage of external secrets. Finally, one of the major takeaways from the conference was learning about kyverno and kubeaudit. I was previously aware of the OPA Gatekeeper. From the several demos I saw, it was to me that kyverno should help us make Toolforge Kubernetes more sustainable by replacing all of our custom admission controllers with it. I already opened a ticket to track this idea, which I ll be proposing to my team soon. Final notes In general, I believe I learned many things, and perhaps even more importantly I re-learned some stuff I had forgotten because of lack of daily exposure. I m really happy that the cloud native way of thinking was reinforced in me, which I still need because most of my muscle memory to approach systems architecture and engineering is from the old pre-cloud days. List of sessions I attended on the first day: List of sessions I attended on the second day: List of sessions I attended on third day: The videos have been published on Youtube.

14 April 2023

John Goerzen: Easily Accessing All Your Stuff with a Zero-Trust Mesh VPN

Probably everyone is familiar with a regular VPN. The traditional use case is to connect to a corporate or home network from a remote location, and access services as if you were there. But these days, the notion of corporate network and home network are less based around physical location. For instance, a company may have no particular office at all, may have a number of offices plus a number of people working remotely, and so forth. A home network might have, say, a PVR and file server, while highly portable devices such as laptops, tablets, and phones may want to talk to each other regardless of location. For instance, a family member might be traveling with a laptop, another at a coffee shop, and those two devices might want to communicate, in addition to talking to the devices at home. And, in both scenarios, there might be questions about giving limited access to friends. Perhaps you d like to give a friend access to part of your file server, or as a company, you might have contractors working on a limited project. Pretty soon you wind up with a mess of VPNs, forwarded ports, and tricks to make it all work. With the increasing prevalence of CGNAT, a lot of times you can t even open a port to the public Internet. Each application or device probably has its own gateway just to make it visible on the Internet, some of which you pay for. Then you add on the question of: should you really trust your LAN anyhow? With possibilities of guests using it, rogue access points, etc., the answer is probably no . We can move the responsibility for dealing with NAT, fluctuating IPs, encryption, and authentication, from the application layer further down into the network stack. We then arrive at a much simpler picture for all. So this page is fundamentally about making the network work, simply and effectively.

How do we make the Internet work in these scenarios? We re going to combine three concepts:
  1. A VPN, providing fully encrypted and authenticated communication and stable IPs
  2. Mesh Networking, in which devices automatically discover optimal paths to reach each other
  3. Zero-trust networking, in which we do not need to trust anything about the underlying LAN, because all our traffic uses the secure systems in points 1 and 2.
By combining these concepts, we arrive at some nice results:
  • You can ssh hostname, where hostname is one of your machines (server, laptop, whatever), and as long as hostname is up, you can reach it, wherever it is, wherever you are.
    • Combined with mosh, these sessions will be durable even across moving to other host networks.
    • You could just as well use telnet, because the underlying network should be secure.
  • You don t have to mess with encryption keys, certs, etc., for every internal-only service. Since IPs are now trustworthy, that s all you need. hosts.allow could make a comeback!
  • You have a way of transiting out of extremely restrictive networks. Every tool discussed here has a way of falling back on routing things via a broker (relay) on TCP port 443 if all else fails.
There might sometimes be tradeoffs. For instance:
  • On LANs faster than 1Gbps, performance may degrade due to encryption and encapsulation overhead. However, these tools should let hosts discover the locality of each other and not send traffic over the Internet if the devices are local.
  • With some of these tools, hosts local to each other (on the same LAN) may be unable to find each other if they can t reach the control plane over the Internet (Internet is down or provider is down)
Some other features that some of the tools provide include:
  • Easy sharing of limited access with friends/guests
  • Taking care of everything you need, including SSL certs, for exposing a certain on-net service to the public Internet
  • Optional routing of your outbound Internet traffic via an exit node on your network. Useful, for instance, if your local network is blocking tons of stuff.
Let s dive in.

Types of Mesh VPNs I ll go over several types of meshes in this article:
  1. Fully decentralized with automatic hop routing This model has no special central control plane. Nodes discover each other in various ways, and establish routes to each other. These routes can be direct connections over the Internet, or via other nodes. This approach offers the greatest resilience. Examples I ll cover include Yggdrasil and tinc.
  2. Automatic peer-to-peer with centralized control In this model, nodes, by default, communicate by establishing direct links between them. A regular node never carries traffic on behalf of other nodes. Special-purpose relays are used to handle cases in which NAT traversal is impossible. This approach tends to offer simple setup. Examples I ll cover include Tailscale, Zerotier, Nebula, and Netmaker.
  3. Roll your own and hybrid approaches This is a grab bag of other ideas; for instance, running Yggdrasil over Tailscale.

Terminology For the sake of consistency, I m going to use common language to discuss things that have different terms in different ecosystems:
  • Every tool discussed here has a way of dealing with NAT traversal. It may assist with establishing direct connections (eg, STUN), and if that fails, it may simply relay traffic between nodes. I ll call such a relay a broker . This may or may not be the same system that is a control plane for a tool.
  • All of these systems operate over lower layers that are unencrypted. Those lower layers may be a LAN (wired or wireless, which may or may not have Internet access), or the public Internet (IPv4 and/or IPv6). I m going to call the unencrypted lower layer, whatever it is, the clearnet .

Evaluation Criteria Here are the things I want to see from a solution:
  • Secure, with all communications end-to-end encrypted and authenticated, and prevention of traffic from untrusted devices.
  • Flexible, adapting to changes in network topology quickly and automatically.
  • Resilient, without single points of failure, and with devices local to each other able to communicate even if cut off from the Internet or other parts of the network.
  • Private, minimizing leakage of information or metadata about me and my systems
  • Able to traverse CGNAT without having to use a broker whenever possible
  • A lesser requirement for me, but still a nice to have, is the ability to include others via something like Internet publishing or inviting guests.
  • Fully or nearly fully Open Source
  • Free or very cheap for personal use
  • Wide operating system support, including headless Linux on x86_64 and ARM.

Fully Decentralized VPNs with Automatic Hop Routing Two systems fit this description: Yggdrasil and Tinc. Let s dive in.

Yggdrasil I ll start with Yggdrasil because I ve written so much about it already. It featured in prior posts such as:

Yggdrasil can be a private mesh VPN, or something more Yggdrasil can be a private mesh VPN, just like the other tools covered here. It s unique, however, in that a key goal of the project is to also make it useful as a planet-scale global mesh network. As such, Yggdrasil is a testbed of new ideas in distributed routing designed to scale up to massive sizes and all sorts of connection conditions. As of 2023-04-10, the main global Yggdrasil mesh has over 5000 nodes in it. You can choose whether or not to participate. Every node in a Yggdrasil mesh has a public/private keypair. Each node then has an IPv6 address (in a private address space) derived from its public key. Using these IPv6 addresses, you can communicate right away. Yggdrasil differs from most of the other tools here in that it does not necessarily seek to establish a direct link on the clearnet between, say, host A and host G for them to communicate. It will prefer such a direct link if it exists, but it is perfectly happy if it doesn t. The reason is that every Yggdrasil node is also a router in the Yggdrasil mesh. Let s sit with that concept for a moment. Consider:
  • If you have a bunch of machines on your LAN, but only one of them can peer over the clearnet, that s fine; all the other machines will discover this route to the world and use it when necessary.
  • All you need to run a broker is just a regular node with a public IP address. If you are participating in the global mesh, you can use one (or more) of the free public peers for this purpose.
  • It is not necessary for every node to know about the clearnet IP address of every other node (improving privacy). In fact, it s not even necessary for every node to know about the existence of all the other nodes, so long as it can find a route to a given node when it s asked to.
  • Yggdrasil can find one or more routes between nodes, and it can use this knowledge of multiple routes to aggressively optimize for varying network conditions, including combinations of, say, downloads and low-latency ssh sessions.
Behind the scenes, Yggdrasil calculates optimal routes between nodes as necessary, using a mesh-wide DHT for initial contact and then deriving more optimal paths. (You can also read more details about the routing algorithm.) One final way that Yggdrasil is different from most of the other tools is that there is no separate control server. No node is special , in charge, the sole keeper of metadata, or anything like that. The entire system is completely distributed and auto-assembling.

Meeting neighbors There are two ways that Yggdrasil knows about peers:
  • By broadcast discovery on the local LAN
  • By listening on a specific port (or being told to connect to a specific host/port)
Sometimes this might lead to multiple ways to connect to a node; Yggdrasil prefers the connection auto-discovered by broadcast first, then the lowest-latency of the defined path. In other words, when your laptops are in the same room as each other on your local LAN, your packets will flow directly between them without traversing the Internet.

Unique uses Yggdrasil is uniquely suited to network-challenged situations. As an example, in a post-disaster situation, Internet access may be unavailable or flaky, yet there may be many local devices perhaps ones that had never known of each other before that could share information. Yggdrasil meets this situation perfectly. The combination of broadcast auto-detection, distributed routing, and so forth, basically means that if there is any physical path between two nodes, Yggdrasil will find and enable it. Ad-hoc wifi is rarely used because it is a real pain. Yggdrasil actually makes it useful! Its broadcast discovery doesn t require any IP address provisioned on the interface at all (it just uses the IPv6 link-local address), so you don t need to figure out a DHCP server or some such. And, Yggdrasil will tend to perform routing along the contours of the RF path. So you could have a laptop in the middle of a long distance relaying communications from people farther out, because it could see both. Or even a chain of such things.

Yggdrasil: Security and Privacy Yggdrasil s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure you keep unauthorized traffic out: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously. I ll discuss firewalling more at the end of this article. Basically, you ll almost certainly want to do this if you participate in the public mesh, because doing so is akin to having a globally-routable public IP address direct to your device. If you want to restrict who can talk to your mesh, you just disable the broadcast feature on all your nodes (empty MulticastInterfaces section in the config), and avoid telling any of your nodes to connect to a public peer. You can set a list of authorized public keys that can connect to your nodes listening interfaces, which you ll probably want to do. You will probably want to either open up some inbound ports (if you can) or set up a node with a known clearnet IP on a place like a $5/mo VPS to help with NAT traversal (again, setting AllowedPublicKeys as appropriate). Yggdrasil doesn t allow filtering multicast clients by public key, only by network interface, so that s why we disable broadcast discovery. You can easily enough teach Yggdrasil about static internal LAN IPs of your nodes and have things work that way. (Or, set up an internal gateway node or two, that the clients just connect to when they re local). But fundamentally, you need to put a bit more thought into this with Yggdrasil than with the other tools here, which are closed-only. Compared to some of the other tools here, Yggdrasil is better about information leakage; nodes only know details, such as clearnet IPs, of directly-connected peers. You can obtain the list of directly-connected peers of any known node in the mesh but that list is the public keys of the directly-connected peers, not the clearnet IPs. Some of the other tools contain a limited integrated firewall of sorts (with limited ACLs and such). Yggdrasil does not, but is fully compatible with on-host firewalls. I recommend these anyway even with many other tools.

Yggdrasil: Connectivity and NAT traversal Compared to the other tools, Yggdrasil is an interesting mix. It provides a fully functional mesh and facilitates connectivity in situations in which no other tool can. Yet its NAT traversal, while it exists and does work, results in using a broker under some of the more challenging CGNAT situations more often than some of the other tools, which can impede performance. Yggdrasil s underlying protocol is TCP-based. Before you run away screaming that it must be slow and unreliable like OpenVPN over TCP it s not, and it is even surprisingly good around bufferbloat. I ve found its performance to be on par with the other tools here, and it works as well as I d expect even on flaky 4G links. Overall, the NAT traversal story is mixed. On the one hand, you can run a node that listens on port 443 and Yggdrasil can even make it speak TLS (even though that s unnecessary from a security standpoint), so you can likely get out of most restrictive firewalls you will ever encounter. If you join the public mesh, know that plenty of public peers do listen on port 443 (and other well-known ports like 53, plus random high-numbered ones). If you connect your system to multiple public peers, there is a chance though a very small one that some public transit traffic might be routed via it. In practice, public peers hopefully are already peered with each other, preventing this from happening (you can verify this with yggdrasilctl debug_remotegetpeers key=ABC...). I have never experienced a problem with this. Also, since latency is a factor in routing for Yggdrasil, it is highly unlikely that random connections we use are going to be competitive with datacenter peers.

Yggdrasil: Sharing with friends If you re open to participating in the public mesh, this is one of the easiest things of all. Have your friend install Yggdrasil, point them to a public peer, give them your Yggdrasil IP, and that s it. (Well, presumably you also open up your firewall you did follow my advice to set one up, right?) If your friend is visiting at your location, they can just hop on your wifi, install Yggdrasil, and it will automatically discover a route to you. Yggdrasil even has a zero-config mode for ephemeral nodes such as certain Docker containers. Yggdrasil doesn t directly support publishing to the clearnet, but it is certainly possible to proxy (or even NAT) to/from the clearnet, and people do.

Yggdrasil: DNS There is no particular extra DNS in Yggdrasil. You can, of course, run a DNS server within Yggdrasil, just as you can anywhere else. Personally I just add relevant hosts to /etc/hosts and leave it at that, but it s up to you.

Yggdrasil: Source code, pricing, and portability Yggdrasil is fully open source (LGPLv3 plus additional permissions in an exception) and highly portable. It is written in Go, and has prebuilt binaries for all major platforms (including a Debian package which I made). There is no charge for anything with Yggdrasil. Listed public peers are free and run by volunteers. You can run your own peers if you like; they can be public and unlisted, public and listed (just submit a PR to get it listed), or private (accepting connections only from certain nodes keys). A peer in this case is just a node with a known clearnet IP address. Yggdrasil encourages use in other projects. For instance, NNCP integrates a Yggdrasil node for easy communication with other NNCP nodes.

Yggdrasil conclusions Yggdrasil is tops in reliability (having no single point of failure) and flexibility. It will maintain opportunistic connections between peers even if the Internet is down. The unique added feature of being able to be part of a global mesh is a nice one. The tradeoffs include being more prone to need to use a broker in restrictive CGNAT environments. Some other tools have clients that override the OS DNS resolver to also provide resolution of hostnames of member nodes; Yggdrasil doesn t, though you can certainly run your own DNS infrastructure over Yggdrasil (or, for that matter, let public DNS servers provide Yggdrasil answers if you wish). There is also a need to pay more attention to firewalling or maintaining separation from the public mesh. However, as I explain below, many other options have potential impacts if the control plane, or your account for it, are compromised, meaning you ought to firewall those, too. Still, it may be a more immediate concern with Yggdrasil. Although Yggdrasil is listed as experimental, I have been using it for over a year and have found it to be rock-solid. They did change how mesh IPs were calculated when moving from 0.3 to 0.4, causing a global renumbering, so just be aware that this is a possibility while it is experimental.

tinc tinc is the oldest tool on this list; version 1.0 came out in 2003! You can think of tinc as something akin to an older Yggdrasil without the public option. I will be discussing tinc 1.0.36, the latest stable version, which came out in 2019. The development branch, 1.1, has been going since 2011 and had its latest release in 2021. The last commit to the Github repo was in June 2022. Tinc is the only tool here to support both tun and tap style interfaces. I go into the difference more in the Zerotier review below. Tinc actually provides a better tap implementation than Zerotier, with various sane options for broadcasts, but I still think the call for an Ethernet, as opposed to IP, VPN is small. To configure tinc, you generate a per-host configuration and then distribute it to every tinc node. It contains a host s public key. Therefore, adding a host to the mesh means distributing its key everywhere; de-authorizing it means removing its key everywhere. This makes it rather unwieldy. tinc can do LAN broadcast discovery and mesh routing, but generally speaking you must manually teach it where to connect initially. Somewhat confusingly, the examples all mention listing a public address for a node. This doesn t make sense for a laptop, and I suspect you d just omit it. I think that address is used for something akin to a Yggdrasil peer with a clearnet IP. Unlike all of the other tools described here, tinc has no tool to inspect the running state of the mesh. Some of the properties of tinc made it clear I was unlikely to adopt it, so this review wasn t as thorough as that of Yggdrasil.

tinc: Security and Privacy As mentioned above, every host in the tinc mesh is authenticated based on its public key. However, to be more precise, this key is validated only at the point it connects to its next hop peer. (To be sure, this is also the same as how the list of allowed pubkeys works in Yggdrasil.) Since IPs in tinc are not derived from their key, and any host can assign itself whatever mesh IP it likes, this implies that a compromised host could impersonate another. It is unclear whether packets are end-to-end encrypted when using a tinc node as a router. The fact that they can be routed at the kernel level by the tun interface implies that they may not be.

tinc: Connectivity and NAT traversal I was unable to find much information about NAT traversal in tinc, other than that it does support it. tinc can run over UDP or TCP and auto-detects which to use, preferring UDP.

tinc: Sharing with friends tinc has no special support for this, and the difficulty of configuration makes it unlikely you d do this with tinc.

tinc: Source code, pricing, and portability tinc is fully open source (GPLv2). It is written in C and generally portable. It supports some very old operating systems. Mobile support is iffy. tinc does not seem to be very actively maintained.

tinc conclusions I haven t mentioned performance in my other reviews (see the section at the end of this post). But, it is so poor as to only run about 300Mbps on my 2.5Gbps network. That s 1/3 the speed of Yggdrasil or Tailscale. Combine that with the unwieldiness of adding hosts and some uncertainties in security, and I m not going to be using tinc.

Automatic Peer-to-Peer Mesh VPNs with centralized control These tend to be the options that are frequently discussed. Let s talk about the options.

Tailscale Tailscale is a popular choice in this type of VPN. To use Tailscale, you first sign up on tailscale.com. Then, you install the tailscale client on each machine. On first run, it prints a URL for you to click on to authorize the client to your mesh ( tailnet ). Tailscale assigns a mesh IP to each system. The Tailscale client lets the Tailscale control plane gather IP information about each node, including all detectable public and private clearnet IPs. When you attempt to contact a node via Tailscale, the client will fetch the known contact information from the control plane and attempt to establish a link. If it can contact over the local LAN, it will (it doesn t have broadcast autodetection like Yggdrasil; the information must come from the control plane). Otherwise, it will try various NAT traversal options. If all else fails, it will use a broker to relay traffic; Tailscale calls a broker a DERP relay server. Unlike Yggdrasil, a Tailscale node never relays traffic for another; all connections are either direct P2P or via a broker. Tailscale, like several others, is based around Wireguard; though wireguard-go rather than the in-kernel Wireguard. Tailscale has a number of somewhat unique features in this space:
  • Funnel, which lets you expose ports on your system to the public Internet via the VPN.
  • Exit nodes, which automate the process of routing your public Internet traffic over some other node in the network. This is possible with every tool mentioned here, but Tailscale makes switching it on or off a couple of quick commands away.
  • Node sharing, which lets you share a subset of your network with guests
  • A fantastic set of documentation, easily the best of the bunch.
Funnel, in particular, is interesting. With a couple of tailscale serve -style commands, you can expose a directory tree (or a development webserver) to the world. Tailscale gives you a public hostname, obtains a cert for it, and proxies inbound traffic to you. This is subject to some unspecified bandwidth limits, and you can only choose from three public ports, so it s not really a production solution but as a quick and easy way to demonstrate something cool to a friend, it s a neat feature.

Tailscale: Security and Privacy With Tailscale, as with the other tools in this category, one of the main threats to consider is the control plane. What are the consequences of a compromise of Tailscale s control plane, or of the credentials you use to access it? Let s begin with the credentials used to access it. Tailscale operates no identity system itself, instead relying on third parties. For individuals, this means Google, Github, or Microsoft accounts; Okta and other SAML and similar identity providers are also supported, but this runs into complexity and expense that most individuals aren t wanting to take on. Unfortunately, all three of those types of accounts often have saved auth tokens in a browser. Personally I would rather have a separate, very secure, login. If a person does compromise your account or the Tailscale servers themselves, they can t directly eavesdrop on your traffic because it is end-to-end encrypted. However, assuming an attacker obtains access to your account, they could:
  • Tamper with your Tailscale ACLs, permitting new actions
  • Add new nodes to the network
  • Forcibly remove nodes from the network
  • Enable or disable optional features
Of note is that they cannot just commandeer an existing IP. I would say the riskiest possibility here is that could add new nodes to the mesh. Because they could also tamper with your ACLs, they could then proceed to attempt to access all your internal services. They could even turn on service collection and have Tailscale tell them what and where all the services are. Therefore, as with other tools, I recommend a local firewall on each machine with Tailscale. More on that below. Tailscale has a new alpha feature called tailnet lock which helps with this problem. It requires existing nodes in the mesh to sign a request for a new node to join. Although this doesn t address ACL tampering and some of the other things, it does represent a significant help with the most significant concern. However, tailnet lock is in alpha, only available on the Enterprise plan, and has a waitlist, so I have been unable to test it. Any Tailscale node can request the IP addresses belonging to any other Tailscale node. The Tailscale control plane captures, and exposes to you, this information about every node in your network: the OS hostname, IP addresses and port numbers, operating system, creation date, last seen timestamp, and NAT traversal parameters. You can optionally enable service data capture as well, which sends data about open ports on each node to the control plane. Tailscale likes to highlight their key expiry and rotation feature. By default, all keys expire after 180 days, and traffic to and from the expired node will be interrupted until they are renewed (basically, you re-login with your provider and do a renew operation). Unfortunately, the only mention I can see of warning of impeding expiration is in the Windows client, and even there you need to edit a registry key to get the warning more than the default 24 hours in advance. In short, it seems likely to cut off communications when it s most important. You can disable key expiry on a per-node basis in the admin console web interface, and I mostly do, due to not wanting to lose connectivity at an inopportune time.

Tailscale: Connectivity and NAT traversal When thinking about reliability, the primary consideration here is being able to reach the Tailscale control plane. While it is possible in limited circumstances to reach nodes without the Tailscale control plane, it is a fairly brittle setup and notably will not survive a client restart. So if you use Tailscale to reach other nodes on your LAN, that won t work unless your Internet is up and the control plane is reachable. Assuming your Internet is up and Tailscale s infrastructure is up, there is little to be concerned with. Your own comfort level with cloud providers and your Internet should guide you here. Tailscale wrote a fantastic article about NAT traversal and they, predictably, do very well with it. Tailscale prefers UDP but falls back to TCP if needed. Broker (DERP) servers step in as a last resort, and Tailscale clients automatically select the best ones. I m not aware of anything that is more successful with NAT traversal than Tailscale. This maximizes the situations in which a direct P2P connection can be used without a broker. I have found Tailscale to be a bit slow to notice changes in network topography compared to Yggdrasil, and sometimes needs a kick in the form of restarting the client process to re-establish communications after a network change. However, it s possible (maybe even probable) that if I d waited a bit longer, it would have sorted this all out.

Tailscale: Sharing with friends I touched on the funnel feature earlier. The sharing feature lets you give an invite to an outsider. By default, a person accepting a share can make only outgoing connections to the network they re invited to, and cannot receive incoming connections from that network this makes sense. When sharing an exit node, you get a checkbox that lets you share access to the exit node as well. Of course, the person accepting the share needs to install the Tailnet client. The combination of funnel and sharing make Tailscale the best for ad-hoc sharing.

Tailscale: DNS Tailscale s DNS is called MagicDNS. It runs as a layer atop your standard DNS taking over /etc/resolv.conf on Linux and provides resolution of mesh hostnames and some other features. This is a concept that is pretty slick. It also is a bit flaky on Linux; dueling programs want to write to /etc/resolv.conf. I can t really say this is entirely Tailscale s fault; they document the problem and some workarounds. I would love to be able to add custom records to this service; for instance, to override the public IP for a service to use the in-mesh IP. Unfortunately, that s not yet possible. However, MagicDNS can query existing nameservers for certain domains in a split DNS setup.

Tailscale: Source code, pricing, and portability Tailscale is almost fully open source and the client is highly portable. The client is open source (BSD 3-clause) on open source platforms, and closed source on closed source platforms. The DERP servers are open source. The coordination server is closed source, although there is an open source coordination server called Headscale (also BSD 3-clause) made available with Tailscale s blessing and informal support. It supports most, but not all, features in the Tailscale coordination server. Tailscale s pricing (which does not apply when using Headscale) provides a free plan for 1 user with up to 20 devices. A Personal Pro plan expands that to 100 devices for $48 per year - not a bad deal at $4/mo. A Community on Github plan also exists, and then there are more business-oriented plans as well. See the pricing page for details. As a small note, I appreciated Tailscale s install script. It properly added Tailscale s apt key in a way that it can only be used to authenticate the Tailscale repo, rather than as a systemwide authenticator. This is a nice touch and speaks well of their developers.

Tailscale conclusions Tailscale is tops in sharing and has a broad feature set and excellent documentation. Like other solutions with a centralized control plane, device communications can stop working if the control plane is unreachable, and the threat model of the control plane should be carefully considered.

Zerotier Zerotier is a close competitor to Tailscale, and is similar to it in a lot of ways. So rather than duplicate all of the Tailscale information here, I m mainly going to describe how it differs from Tailscale. The primary difference between the two is that Zerotier emulates an Ethernet network via a Linux tap interface, while Tailscale emulates a TCP/IP network via a Linux tun interface. However, Zerotier has a number of things that make it be a somewhat imperfect Ethernet emulator. For one, it has a problem with broadcast amplification; the machine sending the broadcast sends it to all the other nodes that should receive it (up to a set maximum). I wouldn t want to have a lot of programs broadcasting on a slow link. While in theory this could let you run Netware or DECNet across Zerotier, I m not really convinced there s much call for that these days, and Zerotier is clearly IP-focused as it allocates IP addresses and such anyhow. Zerotier provides special support for emulated ARP (IPv4) and NDP (IPv6). While you could theoretically run Zerotier as a bridge, this eliminates the zero trust principle, and Tailscale supports subnet routers, which provide much of the same feature set anyhow. A somewhat obscure feature, but possibly useful, is Zerotier s built-in support for multipath WAN for the public interface. This actually lets you do a somewhat basic kind of channel bonding for WAN.

Zerotier: Security and Privacy The picture here is similar to Tailscale, with the difference that you can create a Zerotier-local account rather than relying on cloud authentication. I was unable to find as much detail about Zerotier as I could about Tailscale - notably I couldn t find anything about how sticky an IP address is. However, the configuration screen lets me delete a node and assign additional arbitrary IPs within a subnet to other nodes, so I think the assumption here is that if your Zerotier account (or the Zerotier control plane) is compromised, an attacker could remove a legit device, add a malicious one, and assign the previous IP of the legit device to the malicious one. I m not sure how to mitigate against that risk, as firewalling specific IPs is ineffective if an attacker can simply take them over. Zerotier also lacks anything akin to Tailnet Lock. For this reason, I didn t proceed much further in my Zerotier evaluation.

Zerotier: Connectivity and NAT traversal Like Tailscale, Zerotier has NAT traversal with STUN. However, it looks like it s more limited than Tailscale s, and in particular is incompatible with double NAT that is often seen these days. Zerotier operates brokers ( root servers ) that can do relaying, including TCP relaying. So you should be able to connect even from hostile networks, but you are less likely to form a P2P connection than with Tailscale.

Zerotier: Sharing with friends I was unable to find any special features relating to this in the Zerotier documentation. Therefore, it would be at the same level as Yggdrasil: possible, maybe even not too difficult, but without any specific help.

Zerotier: DNS Unlike Tailscale, Zerotier does not support automatically adding DNS entries for your hosts. Therefore, your options are approximately the same as Yggdrasil, though with the added option of pushing configuration pointing to your own non-Zerotier DNS servers to the client.

Zerotier: Source code, pricing, and portability The client ZeroTier One is available on Github under a custom business source license which prevents you from using it in certain settings. This license would preclude it being included in Debian. Their library, libzt, is available under the same license. The pricing page mentions a community edition for self hosting, but the documentation is sparse and it was difficult to understand what its feature set really is. The free plan lets you have 1 user with up to 25 devices. Paid plans are also available.

Zerotier conclusions Frankly I don t see much reason to use Zerotier. The virtual Ethernet model seems to be a weird hybrid that doesn t bring much value. I m concerned about the implications of a compromise of a user account or the control plane, and it lacks a lot of Tailscale features (MagicDNS and sharing). The only thing it may offer in particular is multipath WAN, but that s esoteric enough and also solvable at other layers that it doesn t seem all that compelling to me. Add to that the strange license and, to me anyhow, I don t see much reason to bother with it.

Netmaker Netmaker is one of the projects that is making noise these days. Netmaker is the only one here that is a wrapper around in-kernel Wireguard, which can make a performance difference when talking to peers on a 1Gbps or faster link. Also, unlike other tools, it has an ingress gateway feature that lets people that don t have the Netmaker client, but do have Wireguard, participate in the VPN. I believe I also saw a reference somewhere to nodes as routers as with Yggdrasil, but I m failing to dig it up now. The project is in a bit of an early state; you can sign up for an upcoming closed beta with a SaaS host, but really you are generally pointed to self-hosting using the code in the github repo. There are community and enterprise editions, but it s not clear how to actually choose. The server has a bunch of components: binary, CoreDNS, database, and web server. It also requires elevated privileges on the host, in addition to a container engine. Contrast that to the single binary that some others provide. It looks like releases are frequent, but sometimes break things, and have a somewhat more laborious upgrade processes than most. I don t want to spend a lot of time managing my mesh. So because of the heavy needs of the server, the upgrades being labor-intensive, it taking over iptables and such on the server, I didn t proceed with a more in-depth evaluation of Netmaker. It has a lot of promise, but for me, it doesn t seem to be in a state that will meet my needs yet.

Nebula Nebula is an interesting mesh project that originated within Slack, seems to still be primarily sponsored by Slack, but is also being developed by Defined Networking (though their product looks early right now). Unlike the other tools in this section, Nebula doesn t have a web interface at all. Defined Networking looks likely to provide something of a SaaS service, but for now, you will need to run a broker ( lighthouse ) yourself; perhaps on a $5/mo VPS. Due to the poor firewall traversal properties, I didn t do a full evaluation of Nebula, but it still has a very interesting design.

Nebula: Security and Privacy Since Nebula lacks a traditional control plane, the root of trust in Nebula is a CA (certificate authority). The documentation gives this example of setting it up:
./nebula-cert sign -name "lighthouse1" -ip "192.168.100.1/24"
./nebula-cert sign -name "laptop" -ip "192.168.100.2/24" -groups "laptop,home,ssh"
./nebula-cert sign -name "server1" -ip "192.168.100.9/24" -groups "servers"
./nebula-cert sign -name "host3" -ip "192.168.100.10/24"
So the cert contains your IP, hostname, and group allocation. Each host in the mesh gets your CA certificate, and the per-host cert and key generated from each of these steps. This leads to a really nice security model. Your CA is the gatekeeper to what is trusted in your mesh. You can even have it airgapped or something to make it exceptionally difficult to breach the perimeter. Nebula contains an integrated firewall. Because the ability to keep out unwanted nodes is so strong, I would say this may be the one mesh VPN you might consider using without bothering with an additional on-host firewall. You can define static mappings from a Nebula mesh IP to a clearnet IP. I haven t found information on this, but theoretically if NAT traversal isn t required, these static mappings may allow Nebula nodes to reach each other even if Internet is down. I don t know if this is truly the case, however.

Nebula: Connectivity and NAT traversal This is a weak point of Nebula. Nebula sends all traffic over a single UDP port; there is no provision for using TCP. This is an issue at certain hotel and other public networks which open only TCP egress ports 80 and 443. I couldn t find a lot of detail on what Nebula s NAT traversal is capable of, but according to a certain Github issue, this has been a sore spot for years and isn t as capable as Tailscale. You can designate nodes in Nebula as brokers (relays). The concept is the same as Yggdrasil, but it s less versatile. You have to manually designate what relay to use. It s unclear to me what happens if different nodes designate different relays. Keep in mind that this always happens over a UDP port.

Nebula: Sharing with friends There is no particular support here.

Nebula: DNS Nebula has experimental DNS support. In contrast with Tailscale, which has an internal DNS server on every node, Nebula only runs a DNS server on a lighthouse. This means that it can t forward requests to a DNS server that s upstream for your laptop s particular current location. Actually, Nebula s DNS server doesn t forward at all. It also doesn t resolve its own name. The Nebula documentation makes reference to using multiple lighthouses, which you may want to do for DNS redundancy or performance, but it s unclear to me if this would make each lighthouse form a complete picture of the network.

Nebula: Source code, pricing, and portability Nebula is fully open source (MIT). It consists of a single Go binary and configuration. It is fairly portable.

Nebula conclusions I am attracted to Nebula s unique security model. I would probably be more seriously considering it if not for the lack of support for TCP and poor general NAT traversal properties. Its datacenter connectivity heritage does show through.

Roll your own and hybrid Here is a grab bag of ideas:

Running Yggdrasil over Tailscale One possibility would be to use Tailscale for its superior NAT traversal, then allow Yggdrasil to run over it. (You will need a firewall to prevent Tailscale from trying to run over Yggdrasil at the same time!) This creates a closed network with all the benefits of Yggdrasil, yet getting the NAT traversal from Tailscale. Drawbacks might be the overhead of the double encryption and double encapsulation. A good Yggdrasil peer may wind up being faster than this anyhow.

Public VPN provider for NAT traversal A public VPN provider such as Mullvad will often offer incoming port forwarding and nodes in many cities. This could be an attractive way to solve a bunch of NAT traversal problems: just use one of those services to get you an incoming port, and run whatever you like over that. Be aware that a number of public VPN clients have a kill switch to prevent any traffic from egressing without using the VPN; see, for instance, Mullvad s. You ll need to disable this if you are running a mesh atop it.

Other

Combining with local firewalls For most of these tools, I recommend using a local firewal in conjunction with them. I have been using firehol and find it to be quite nice. This means you don t have to trust the mesh, the control plane, or whatever. The catch is that you do need your mesh VPN to provide strong association between IP address and node. Most, but not all, do.

Performance I tested some of these for performance using iperf3 on a 2.5Gbps LAN. Here are the results. All speeds are in Mbps.
Tool iperf3 (default) iperf3 -P 10 iperf3 -R
Direct (no VPN) 2406 2406 2764
Wireguard (kernel) 1515 1566 2027
Yggdrasil 892 1126 1105
Tailscale 950 1034 1085
Tinc 296 300 277
You can see that Wireguard was significantly faster than the other options. Tailscale and Yggdrasil were roughly comparable, and Tinc was terrible.

IP collisions When you are communicating over a network such as these, you need to trust that the IP address you are communicating with belongs to the system you think it does. This protects against two malicious actor scenarios:
  1. Someone compromises one machine on your mesh and reconfigures it to impersonate a more important one
  2. Someone connects an unauthorized system to the mesh, taking over a trusted IP, and uses the privileges of the trusted IP to access resources
To summarize the state of play as highlighted in the reviews above:
  • Yggdrasil derives IPv6 addresses from a public key
  • tinc allows any node to set any IP
  • Tailscale IPs aren t user-assignable, but the assignment algorithm is unknown
  • Zerotier allows any IP to be allocated to any node at the control plane
  • I don t know what Netmaker does
  • Nebula IPs are baked into the cert and signed by the CA, but I haven t verified the enforcement algorithm
So this discussion really only applies to Yggdrasil and Tailscale. tinc and Zerotier lack detailed IP security, while Nebula expects IP allocations to be handled outside of the tool and baked into the certs (therefore enforcing rigidity at that level). So the question for Yggdrasil and Tailscale is: how easy is it to commandeer a trusted IP? Yggdrasil has a brief discussion of this. In short, Yggdrasil offers you both a dedicated IP and a rarely-used /64 prefix which you can delegate to other machines on your LAN. Obviously by taking the dedicated IP, a lot more bits are available for the hash of the node s public key, making collisions technically impractical, if not outright impossible. However, if you use the /64 prefix, a collision may be more possible. Yggdrasil s hashing algorithm includes some optimizations to make this more difficult. Yggdrasil includes a genkeys tool that uses more CPU cycles to generate keys that are maximally difficult to collide with. Tailscale doesn t document their IP assignment algorithm, but I think it is safe to say that the larger subnet you use, the better. If you try to use a /24 for your mesh, it is certainly conceivable that an attacker could remove your trusted node, then just manually add the 240 or so machines it would take to get that IP reassigned. It might be a good idea to use a purely IPv6 mesh with Tailscale to minimize this problem as well. So, I think the risk is low in the default configurations of both Yggdrasil and Tailscale (certainly lower than with tinc or Zerotier). You can drive the risk even lower with both.

Final thoughts For my own purposes, I suspect I will remain with Yggdrasil in some fashion. Maybe I will just take the small performance hit that using a relay node implies. Or perhaps I will get clever and use an incoming VPN port forward or go over Tailscale. Tailscale was the other option that seemed most interesting. However, living in a region with Internet that goes down more often than I d like, I would like to just be able to send as much traffic over a mesh as possible, trusting that if the LAN is up, the mesh is up. I have one thing that really benefits from performance in excess of Yggdrasil or Tailscale: NFS. That s between two machines that never leave my LAN, so I will probably just set up a direct Wireguard link between them. Heck of a lot easier than trying to do Kerberos! Finally, I wrote this intending to be useful. I dealt with a lot of complexity and under-documentation, so it s possible I got something wrong somewhere. Please let me know if you find any errors.
This blog post is a copy of a page on my website. That page may be periodically updated.

3 April 2023

Russ Allbery: Review: The Nordic Theory of Everything

Review: The Nordic Theory of Everything, by Anu Partanen
Publisher: Harper
Copyright: 2016
Printing: June 2017
ISBN: 0-06-231656-7
Format: Kindle
Pages: 338
Anu Partanen is a Finnish journalist who immigrated to the United States. The Nordic Theory of Everything, subtitled In Search of a Better Life, is an attempt to explain the merits of Finnish approaches to government and society to a US audience. It was her first book. If you follow US policy discussion at all, you have probably been exposed to many of the ideas in this book. There was a time when the US left was obsessed with comparisons between the US and Nordic countries, and while that obsession has faded somewhat, Nordic social systems are still discussed with envy and treated as a potential model. Many of the topics of this book are therefore predictable: parental leave, vacation, health care, education, happiness, life expectancy, all the things that are far superior in Nordic countries than in the United States by essentially every statistical measure available, and which have been much-discussed. Partanen brings two twists to this standard analysis. The first is that this book is part memoir: she fell in love with a US writer and made the decision to move to the US rather than asking him to move to Finland. She therefore experienced the transition between social and government systems first-hand and writes memorably on the resulting surprise, trade-offs, anxiety, and bafflement. The second, which I've not seen previously in this policy debate, is a fascinating argument that Finland is a far more individualistic country than the United States precisely because of its policy differences.
Most people, including myself, assumed that part of what made the United States a great country, and such an exceptional one, was that you could live your life relatively unencumbered by the downside of a traditional, old-fashioned society: dependency on the people you happened to be stuck with. In America you had the liberty to express your individuality and choose your own community. This would allow you to interact with family, neighbors, and fellow citizens on the basis of who you were, rather than on what you were obligated to do or expected to be according to old-fashioned thinking. The longer I lived in America, therefore, and the more places I visited and the more people I met and the more American I myself became the more puzzled I grew. For it was exactly those key benefits of modernity freedom, personal independence, and opportunity that seemed, from my outsider s perspective, in a thousand small ways to be surprisingly missing from American life today. Amid the anxiety and stress of people s daily lives, those grand ideals were looking more theoretical than actual.
The core of this argument is that the structure of life in the United States essentially coerces dependency on other people: employers, spouses, parents, children, and extended family. Because there is no universally available social support system, those relationships become essential for any hope of a good life, and often for survival. If parents do not heavily manage their children's education, there is a substantial risk of long-lasting damage to the stability and happiness of their life. If children do not care for their elderly parents, they may receive no care at all. Choosing not to get married often means choosing precarity and exhaustion because navigating society without pooling resources with someone else is incredibly difficult.
It was as if America, land of the Hollywood romance, was in practice mired in a premodern time when marriage was, first and foremost, not an expression of love, but rather a logistical and financial pact to help families survive by joining resources.
Partanen contrasts this with what she calls the Nordic theory of love:
What Lars Tr g rdh came to understand during his years in the United States was that the overarching ambition of Nordic societies during the course of the twentieth century, and into the twenty-first, has not been to socialize the economy at all, as is often mistakenly assumed. Rather the goal has been to free the individual from all forms of dependency within the family and in civil society: the poor from charity, wives from husbands, adult children from parents, and elderly parents from their children. The express purpose of this freedom is to allow all those human relationships to be unencumbered by ulterior motives and needs, and thus to be entirely free, completely authentic, and driven purely by love.
She sees this as the common theme through most of the policy differences discussed in this book. The Finnish approach is to provide neutral and universal logistical support for most of life's expected challenges: birth, child-rearing, education, health, unemployment, and aging. This relieves other social relations family, employer, church of the corrosive strain of dependency and obligation. It also ensures people's basic well-being isn't reliant on accidents of association.
If the United States is so worried about crushing entrepreneurship and innovation, a good place to start would be freeing start-ups and companies from the burdens of babysitting the nation s citizens.
I found this fascinating as a persuasive technique. Partanen embraces the US ideal of individualism and points out that, rather than being collectivist as the US right tends to assume, Finland is better at fostering individualism and independence because the government works to removes unnecessary premodern constraints on individual lives. The reason why so many Americans are anxious and frantic is not a personal failing or bad luck. It's because the US social system is deeply hostile to healthy relationships and individual independence. It demands a constant level of daily problem-solving and crisis management that is profoundly exhausting, nearly impossible to navigate alone, and damaging to the ideal of equal relationships. Whether this line of argument will work is another question, and I'm dubious for reasons that Partanen (probably wisely) avoids. She presents the Finnish approach as a discovery that the US would benefit from, and the US approach as a well-intentioned mistake. I think this is superficially appealing; almost all corners of US political belief at least give lip service to individualism and independence. However, advocates of political change will eventually need to address the fact that many US conservatives see this type of social coercion as an intended feature of society rather than a flaw. This is most obvious when one looks at family relationships. Partanen treats the idea that marriage should be a free choice between equals rather than an economic necessity as self-evident, but there is a significant strain of US political thought that embraces punishing people for not staying within the bounds of a conservative ideal of family. One will often find, primarily but not exclusively among the more religious, a contention that the basic unit of society is the (heterosexual, patriarchal) family, not the individual, and that the suffering of anyone outside that structure is their own fault. Not wanting to get married, be the primary caregiver for one's parents, or abandon a career in order to raise children is treated as malignant selfishness and immorality rather than a personal choice that can be enabled by a modern social system. Here, I think Partanen is accurate to identify the Finnish social system as more modern. It embraces the philosophical concept of modernity, namely that social systems can be improved and social structures are not timeless. This is going to be a hard argument to swallow for those who see the pressure towards forming dependency ties within families as natural, and societal efforts to relieve those pressures as government meddling. In that intellectual framework, rather than an attempt to improve the quality of life, government logistical support is perceived as hostility to traditional family obligations and an attempt to replace "natural" human ties with "artificial" dependence on government services. Partanen doesn't attempt to have that debate. Two other things struck me in this book. The first is that, in Partanen's presentation, Finns expect high-quality services from their government and work to improve it when it falls short. This sounds like an obvious statement, but I don't think it is in the context of US politics, and neither does Partanen. She devotes a chapter to the topic, subtitled "Go ahead: ask what your country can do for you." This is, to me, one of the most frustrating aspects of US political debate. Our attitude towards government is almost entirely hostile and negative even among the political corners that would like to see government do more. Failures of government programs are treated as malice, malfeasance, or inherent incompetence: in short, signs the program should never have been attempted, rather than opportunities to learn and improve. Finland had mediocre public schools, decided to make them better, and succeeded. The moment US public schools start deteriorating, we throw much of our effort into encouraging private competition and dismantling the public school system. Partanen doesn't draw this connection, but I see a link between the US desire for market solutions to societal problems and the level of exhaustion and anxiety that is so common in US life. Solving problems by throwing them open to competition is a way of giving up, of saying we have no idea how to improve something and are hoping someone else will figure it out for a profit. Analyzing the failures of an existing system and designing incremental improvements is hard and slow work. Throwing out the system and hoping some corporation will come up with something better is disruptive but easy. When everyone is already overwhelmed by life and devoid of energy to work on complex social problems, it's tempting to give up on compromise and coalition-building and let everyone go their separate ways on their own dime. We cede the essential work of designing a good society to start-ups. This creates a vicious cycle: the resulting market solutions are inevitably gated by wealth and thus precarious and artificially scarce, which in turn creates more anxiety and stress. The short-term energy savings from not having to wrestle with a hard problem is overwhelmed by the long-term cost of having to navigate a complex and adversarial economic relationship. That leads into the last point: schools. There's a lot of discussion here about school quality and design, which I won't review in detail but which is worth reading. What struck me about Partanen's discussion, though, is how easy the Finnish system is to use. Finnish parents just send their kids to the most convenient school and rarely give that a second thought. The critical property is that all the schools are basically fine, and therefore there is no need to place one's child in an exceptional school to ensure they have a good life. It's axiomatic in the US that more choice is better. This is a constant refrain in our political discussion around schools: parental choice, parental control, options, decisions, permission, matching children to schools tailored for their needs. Those choices are almost entirely absent in Finland, at least in Partanen's description, and the amount of mental and emotional energy this saves is astonishing. Parents simply don't think about this, and everything is fine. I think we dramatically underestimate the negative effects of constantly having to make difficult decisions with significant consequences, and drastically overstate the benefits of having every aspect of life be full of major decision points. To let go of that attempt at control, however illusory, people have to believe in a baseline of quality that makes the choice less fraught. That's precisely what Finland provides by expecting high-quality social services and working to fix them when they fall short, an effort that the United States has by and large abandoned. A lot of non-fiction books could be turned into long articles without losing much substance, and I think The Nordic Theory of Everything falls partly into that trap. Partanen repeats the same ideas from several different angles, and the book felt a bit padded towards the end. If you're already familiar with the policy comparisons between the US and Nordic countries, you will have seen a lot of this before, and the book bogs down when Partanen strays too far from memoir and personal reactions. But the focus on individualism and eliminating dependency is new, at least to me, and is such an illuminating way to look at the contrast that I think the book is worth reading just for that. Rating: 7 out of 10

28 March 2023

Matt Brown: Ventilation Monitoring: Ensuring every space has clean, fresh air

The importance of clean, fresh indoor air is one of the most tangible takeaways of the Covid-19 pandemic. In addition to being an effective risk mitigation strategy for reducing the spread of respiratory illnesses, clean, fresh air is necessary to enable effective cognitive performance. Monitoring indoor air quality is relatively easy to do, but traditionally has not been a key focus. I believe air quality monitoring should be accessible for any indoor space, and for highly occupied indoor spaces should be provided on a continuous basis. This post explores the need and an opportunity for a business that can accelerate the adoption of ventilation monitoring through the following topics:

The importance of indoor air quality Clean, fresh air is fundamental to life and health. That might sound obvious, but unfortunately being obvious is not enough to ensure the air we breathe is in fact always clean and healthy. Repeated studies have revealed that in many cases the air you re breathing at school, in the bus or at work and probably also at home falls well below the ideal of what clean, fresh air should be. Unclean air has potential long-term health impacts and has also been shown to lower cognitive performance impacting the ability to learn and work as well as increasing the risk of transmission of respiratory illnesses like Covid-19 and the flu. Ventilation (replacing old stale air with clean fresh air) is the most effective and economical method of improving and maintaining high indoor air quality. Most New Zealand buildings (including schools and houses) are designed to rely on manual ventilation (opening windows), while newer buildings, often including larger or commercial buildings may use mechanical ventilation involving fans and ducts. Mechanical ventilation including filtration may also be required in situations where the outdoor air is not clean and fresh such as in a city or next to a busy intersection.

Observing the invisible Overall air quality is a complex topic involving many contributing factors, many of which are invisible and not perceptible to us until well after adverse effects or irritation occur. This complexity and lack of visible signal is a large contributing factor to the ignorance and lack of attention towards indoor air quality that is prevalent in most buildings and indoor spaces today. Our attention is biased towards the risks that we can see, and this default bias has not been helped by hesitation and resistance to the idea that aerosol transmission and air quality is an important factor in preventing disease transmission that has only recently started to change. Zeynep Tufekci has a great overview that provides fascinating context for how an overreaction to the early incorrect theories of bad air and miasma causing disease contributed to aerosol transmission and air quality being incorrectly neglected for so long. Correcting this history of inattention to indoor air quality is going to take time and effort, but one significant step that we can take to help start the journey to ensuring all indoor spaces have clean, healthy air is to make the invisible part visible. The concentration of carbon dioxide (CO2) in a space is an incredibly effective and easy to measure proxy for the ventilation of a space. The atmospheric background level of CO2 is around 420 parts per million (ppm), while our exhaled breath has concentrations as high as 40,000 ppm. Without effective ventilation, one or more people breathing in an enclosed space will rapidly lead to an observable increase in CO2 concentration, which in turn provides a signal that the ventilation is insufficient and needs to be improved. Monitoring CO2 and improving ventilation is not a panacea for all possible air quality issues, but for the majority of buildings and indoor spaces, using CO2 as a proxy for ventilation and increasing ventilation when CO2 levels rise above recommended levels is a simple, effective and achievable approach that will deliver improvements in cognitive performance and reduction in the risk of disease transmission with few, if any, downsides or risks. See this Public Health Communication Centre briefing for a more detailed explanation.

Adding clean air to our hygiene practices We have well established expectations of hygiene for the food we eat and the water we drink and these expectations are codified in regulations that ensure those providing these services do so in a way that gives us confidence that we re not going to be at risk of illness. You may recall seeing food grade ratings prominently displayed on the walls of restaurants and cafes that you visit as an example of this. Why should the air we breathe be treated any differently? I think there is a strong argument that indoor air quality deserves regulation, both of the absolute quality of the air and ensuring that the practices and achieved air quality are clearly advertised and available. Ventilation monitoring via measurement of CO2 concentration provides an effective and achievable method that can be used to achieve this, and countries like Belgium and Japan are already starting to regulate indoor air quality. In the UK, the independent SAGE group of scientists has published Scores on the Doors , a proposal which demonstrates how CO2 monitoring can be helpful in providing information about the air quality of indoor spaces. Unfortunately there is no movement in any of these directions in New Zealand yet, and no sign that regulation or even a basic campaign to raise awareness of ventilation and air quality is even being planned. This is disappointing, but even if such work was planned, it would still require appropriate ventilation monitoring products and services to enable it, and while there are some options available, it is not a fully solved space yet.

Existing ventilation monitoring options Until recently the available offerings for ventilation monitoring have sat at two distinct ends of the price and quality spectrum:
  • Handheld air quality meters advertised as measuring CO2, but in reality reporting only an approximation. These meters do not contain actual CO2 sensors, and only approximate CO2 levels based on measurements of other components of the air. While cheap (often less than $100), these meters are not useful for providing reliable data that can be systematically used to assess and improve ventilation and should be avoided.
  • High-end building management systems (BMS), and industrial measurement products targeted at large buildings such as offices or commercial applications such as food production. These systems require specialist installation, often integrated with large whole-building air conditioning systems. These systems, if appropriately configured, can be a great solution for the types of buildings and spaces that can afford them, but by their nature and cost, they do not offer a solution for the majority of smaller buildings and indoor spaces where we tend to spend a lot of our time.
Over the last few years a growing number of companies have developed products that fit in between the unreliable air quality meters and the expensive BMS/industrial measurement products. Promising NZ-based options in this space include Air Suite, Tether and Monkeytronics. These products are wall mountable, resemble a smoke alarm and utilise a WiFi network to report their measurements to a supporting web service. Pricing varies between $200 and $300 ex GST per unit. Aranet, while not NZ based, provides a handheld monitor the Aranet4 Home, which is well regarded for quality and accuracy. Aranet4 Home devices are the most expensive in this space, retailing at $386 ex GST and offer a clunkier and less convenient set of connectivity options via a Bluetooth connection to an associated phone. To obtain similar reporting functionality to the other products requires upgrading to their Pro model and purchasing a separate base station at a combined cost of $1255 ex GST. Outside the commercial product offerings are a number of open source DIY options, which can be built by anyone with basic electronics knowledge. AirGradient is a leading example based in Thailand, and within New Zealand Oliver Seiler s CO2 Monitor provides similar functionality. These open source options have a parts cost in the $100-$150 range, depending on volume built and provide high-quality measurements via trusted CO2 sensors while also offering huge flexibility in terms of how they operate, interact with users and potential supporting web services.

An opportunity: Small businesses and organisations While a growing number of high-quality CO2 monitors has the potential to help drive increased adoption of ventilation monitoring, the plethora of small businesses and organisations that own, operate and manage many of the indoor spaces we visit on a day-to-day basis do not appear to be well served by these existing products. To deploy ventilation monitoring a small business or organisation needs to first become aware of the need or demand for it, and then have a simple and easy path to acquire and install the monitor and access the data. Little to no marketing or demand generation appears to be targeted towards this market from the existing businesses and tellingly, several of the products are not directly available for sale, requiring interaction with a salesperson to purchase. This indicates a focus on selling to larger customers who have a campus or portfolio of buildings and will purchase in larger quantities than the typical small business or organisation will. Small businesses and organisations are likely to occupy smaller buildings and spaces where manual ventilation is the prevalent method of improving and maintaining air quality. Maintaining clean, fresh air via manual ventilation requires the occupants of the space to receive an obvious and straightforward signal when action (opening windows, etc) is required. While the products above all tend to provide some form of local feedback and display in the room, the indication provided and notification of when to take action is less obvious and prominent than would be ideal in a situation where manual ventilation is being relied upon. Informally testing this opportunity with family and friends running small businesses over the last few months has resulted in promising feedback. One particular success story was the discovery of a fresh air duct on the air conditioning unit in a small office that had never been connected to the outside air and was simply recirculating air from the ceiling space back into an office! The resulting stuffiness and poor air quality had been noticed, but without the clear indication from the CO2 monitor that the air conditioning was actually making things worse, rather than better, the underlying issue had not been understood. With the issue fixed and the duct now connected, that business is now enjoying much more productive and healthy working conditions.

Next steps Many small businesses and organisations are likely to have poor air quality and opportunities for improvement similar to the example above that are waiting to be found and fixed, and the existing products available are neither focused or ideal for the needs of this market. I have spent some time over the past six months building a basic CO2 monitoring service that I have used to deploy ventilation monitoring to our local school, and a few other local businesses. There are a number of challenges that still need to be addressed in order to scale the business up, but I think there is a reasonable chance that I can build a viable business that offers an attractive and useful solution that would accelerate the deployment of ventilation monitoring for small businesses and organisations. In an upcoming post, I will explain the foundations of the service that I have built to date, the challenges that need to be overcome and how I plan to evolve the service from the current prototype into a sustainable, bootstrapped business.

27 March 2023

Russell Coker: Strange X11 Grabbing

A couple of days ago I upgraded my home server from Debian/Bullseye to Debian/Testing (soon to be Bookworm). Since then KDE sessions on that system have had problems of locking the input queue, the mouse can move and mouse-over events work but clicking the mouse or pressing the keyboard does nothing. Various web pages suggested that the xdotool program (in the xdotool package in Debian) can address this. The problem is apparently programs grabbing the input and not letting it go. The command xdotool key XF86LogGrabInfo causes the xorg server to dump information on it s grabs . After running that command I looked in /var/log/Xorg.0.log and found that active grabs were only held by /usr/bin/kwin_x11 and /usr/bin/kglobalaccel5. So it seems like a KDE issue. Other systems running X11 with Debian/Testing (such as the laptop I m using to write this blog post) don t have the problem, so it could be something related to the KDE configuration of the account used on that system. The command xdotool key XF86Ungrab is supposed to break out of such a grab, but for me didn t do so. On the same system running KDE with Wayland works fine in this regard. Does Wayland do things differently and not allow this grabbing to block everything? Does KDE have an X11 specific bug? Is there a race condition that just gets triggered by the speed of Xorg on that system but not by the slightly different timings of Wayland? I might never find out. I previously wrote about problems with Wayland/KDE on laptops [1]. Fortunately this bug happened to occur on a server so inability to reconfigure monitors isn t necessarily a deal breaker, although being unable to use some of the high-DPI settings for the 4K monitor it has may be an issue. It will be really annoying if some of the laptop configurations I support get this grabbing problem. But since that time I have learned of the kscreen-doctor command which is included in Debian/Testing and can do some of the necessary things, it doesn t have a man page so you have to run kscreen-doctor -h for documentation.

Next.

Previous.